Assembly Handle Optimization¶
The slat handle match evolver module provides access to our evolutionary algorithm for intelligently assigning assembly handles to a megastructure design. The below are the main functions and classes available in this module.
Handle Evolution¶
Core evolutionary algorithm implementation for handle sequence evolution.
crisscross.slat_handle_match_evolver.handle_evolution ¶
EvolveManager ¶
EvolveManager(
megastructure: Megastructure,
random_seed=8,
generational_survivors=3,
mutation_rate=5,
mutation_type_probabilities=(0.425, 0.425, 0.15),
unique_handle_sequences=32,
evolution_generations=200,
evolution_population=30,
split_sequence_handles=False,
sequence_split_factor=2,
process_count=None,
early_max_valency_stop=None,
log_tracking_directory=None,
progress_bar_update_iterations=2,
mutation_memory_system="off",
memory_length=10,
repeating_unit_constraints=None,
similarity_score_calculation_frequency=10,
update_scope="interfaces",
)
Prepares an evolution manager to optimize a handle array for the provided slat array. WARNING: Make sure to use the "if name == 'main':" block to run this class in a script. Otherwise, the spawned processes will cause a recursion error.
| PARAMETER | DESCRIPTION |
|---|---|
megastructure
|
Megastructure object containing slat array and optionally a seed handle array for starting evolution
TYPE:
|
random_seed
|
Random seed to use to ensure consistency
DEFAULT:
|
generational_survivors
|
Number of surviving candidate arrays that persist through each generation
DEFAULT:
|
mutation_rate
|
The expected number of mutations per iteration
DEFAULT:
|
mutation_type_probabilities
|
Probability of selecting a specific mutation type for a target handle/antihandle (either handle, antihandle or mixed mutations)
DEFAULT:
|
unique_handle_sequences
|
Handle library length
DEFAULT:
|
evolution_generations
|
Number of generations to consider before stopping
DEFAULT:
|
evolution_population
|
Number of handle arrays to mutate in each generation
DEFAULT:
|
split_sequence_handles
|
Set to true to enforce the splitting of handle sequences between subsequent layers
DEFAULT:
|
sequence_split_factor
|
Factor by which to split the handle sequences between layers (default is 2, which means that if handles are split, the first layer will have 1/2 of the handles, etc.)
DEFAULT:
|
process_count
|
Number of threads to use for hamming multiprocessing (if set to default, will use 67% of available cores)
DEFAULT:
|
early_max_valency_stop
|
If this worst match score is achieved, the evolution will stop early
DEFAULT:
|
log_tracking_directory
|
Set to a directory to export plots and metrics during the optimization process (optional)
DEFAULT:
|
progress_bar_update_iterations
|
Number of iterations before progress bar is updated - useful for server output files, but does not seem to work consistently on every system (optional)
DEFAULT:
|
mutation_memory_system
|
The type of memory system to use for the handle mutation process. Options are 'all', 'best', 'special', or 'off'.
DEFAULT:
|
memory_length
|
Memory of previous 'worst' handle combinations to retain when selecting positions to mutate.
DEFAULT:
|
repeating_unit_constraints
|
Dictionary to define handle link constraints on handle mutations (mainly for use with repeating unit designs). This is largely depracted now in favour of the Megastructure recursive handle linking system.
DEFAULT:
|
similarity_score_calculation_frequency
|
The duplication risk score will be calculated every x generations. This helps speed up the evolution process, since it isn't used for deciding on the best generations to retain.
DEFAULT:
|
update_scope
|
Either 'interfaces' (default) to only place handles at layer interfaces, or 'all' to also place handles at positions with existing handles in the seed array.
DEFAULT:
|
single_evolution_step ¶
Performs a single evolution step, evaluating all candidate arrays and preparing new mutations for the next generation.
| RETURNS | DESCRIPTION |
|---|---|
|
|
run_full_experiment ¶
Runs a full evolution experiment.
| PARAMETER | DESCRIPTION |
|---|---|
logging_interval
|
The frequency at which logs should be written to file (including the best hamming array file).
DEFAULT:
|
suppress_handle_array_export
|
If true, the best handle array will not be exported at each logging interval, saving space.
DEFAULT:
|
save_first
|
If true, saves the best handle array from the initial random population as 'best_handle_array_initial.xlsx'.
DEFAULT:
|
ensure_pool ¶
Create a new Pool if none exists or it is not RUN. Safe to call at the beginning of each generation or on resume.
Handle Mutation¶
Mutation operators for genetic algorithm optimization.
crisscross.slat_handle_match_evolver.handle_mutation ¶
mutate_handle_arrays ¶
mutate_handle_arrays(
slat_array,
candidate_handle_arrays,
hallofshame,
best_score_indices,
unique_sequences=32,
memory_hallofshame=None,
memory_best_parent_hallofshame=None,
special_hallofshame=None,
mutation_rate=2.0,
mutation_type_probabilities=(0.425, 0.425, 0.15),
use_memory_type=None,
split_sequence_handles=False,
sequence_split_factor=2,
repeating_unit_constraints=None,
mutation_mask=None,
)
Mutates (randomizes handles) a set of candidate arrays into a new generation, while retaining the best scoring arrays from the previous generation.
| PARAMETER | DESCRIPTION |
|---|---|
slat_array
|
Base slat array for design
|
candidate_handle_arrays
|
Set of candidate handle arrays from previous generation
|
hallofshame
|
Worst handle/antihandle combinations from previous generation
|
best_score_indices
|
The indices of the best scoring arrays from the previous generation
|
unique_sequences
|
Total length of handle library available
DEFAULT:
|
memory_hallofshame
|
List of all the worst handle/antihandle combinations from previous generations
DEFAULT:
|
memory_best_parent_hallofshame
|
List of the worst handle/antihandle combinations from previous generations (only the ones linked to the best parents)
DEFAULT:
|
special_hallofshame
|
List of special handle/antihandle combinations from previous generations
DEFAULT:
|
mutation_rate
|
The expected number of mutations per cycle
DEFAULT:
|
mutation_type_probabilities
|
Probability of selecting a specific mutation type for a target handle/antihandle (either handle, antihandle or mixed mutations)
DEFAULT:
|
use_memory_type
|
Select memory type to use for mutation selection ('off', 'all', 'best_memory', 'special')
DEFAULT:
|
split_sequence_handles
|
Set to true if the handle library needs to be split between subsequent layers
DEFAULT:
|
sequence_split_factor
|
The number of layers to split the handle library between (default is 2, which means a single layer would have half the available library)
DEFAULT:
|
repeating_unit_constraints
|
Dictionary of handles to link together (mostly deprecated in favour of recursive algorithm in Megastructure). Syntax is {(layer, 'top'/'bottom', (x,y)): (layer, 'top'/'bottom', (x,y))}
DEFAULT:
|
mutation_mask
|
Optional integer mask to inform mutation system of any handle links in the slat design (created through recursion system)
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
New generation of handle arrays to be screened |
Utility Functions¶
Utility functions for generating random sequences.
crisscross.slat_handle_match_evolver ¶
generate_random_slat_handles ¶
generate_random_slat_handles(
base_array, unique_sequences=32, link_handles={}, additional_positions=None
)
Generates an array of handles, all randomly selected.
| PARAMETER | DESCRIPTION |
|---|---|
base_array
|
Megastructure handle positions in a 3D array
|
unique_sequences
|
Number of possible handle sequences
DEFAULT:
|
link_handles
|
Dictionary of handles to link together (mostly deprecated in favour of recursive algorithm in Megastructure). Syntax is {(layer, 'top'/'bottom', (x,y)): (layer, 'top'/'bottom', (x,y))}
DEFAULT:
|
additional_positions
|
Optional set of (x, y, z) tuples indicating additional positions where handles should be placed beyond interface positions
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
2D array with handle IDs |
generate_layer_split_handles ¶
generate_layer_split_handles(
base_array,
unique_sequences=32,
split_factor=2,
link_handles={},
additional_positions=None,
)
Generates an array of handles, with the possible ids split between each layer, with the goal of preventing a single slat from being self-complementary.
| PARAMETER | DESCRIPTION |
|---|---|
base_array
|
Megastructure handle positions in a 3D array
|
unique_sequences
|
Number of possible handle sequences
DEFAULT:
|
split_factor
|
Number of layers to split the handle sequences between
DEFAULT:
|
link_handles
|
Dictionary of handles to link together (mostly deprecated in favour of recursive algorithm in Megastructure). Syntax is {(layer, 'top'/'bottom', (x,y)): (layer, 'top'/'bottom', (x,y))}
DEFAULT:
|
additional_positions
|
Optional set of (x, y, z) tuples indicating additional positions where handles should be placed beyond interface positions
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
2D array with handle IDs |
update_split_slat_handles ¶
Updates the split handle array with new random values inplace
| PARAMETER | DESCRIPTION |
|---|---|
handle_array
|
Pre-populated split handle array
|
unique_sequences
|
Max number of unique sequences
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
N/A |
update_random_slat_handles ¶
Updates the handle array with new random values inplace
| PARAMETER | DESCRIPTION |
|---|---|
handle_array
|
Pre-populated handle array
|
unique_sequences
|
Max number of unique sequences
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
N/A |
Random Hamming Optimizer¶
Random search baseline algorithm.
crisscross.slat_handle_match_evolver.random_hamming_optimizer ¶
generate_handle_set_and_optimize ¶
generate_handle_set_and_optimize(
base_array,
unique_sequences=32,
slat_length=32,
max_rounds=30,
split_sequence_handles=False,
universal_hamming=True,
layer_hamming=False,
group_hamming=None,
metric_to_optimize="Universal",
)
Generates random handle sets and attempts to choose the best set based on the hamming distance between slat assembly handles.
| PARAMETER | DESCRIPTION |
|---|---|
base_array
|
Slat position array (3D)
|
unique_sequences
|
Max unique sequences in the handle array
DEFAULT:
|
slat_length
|
Length of a single slat
DEFAULT:
|
max_rounds
|
Maximum number of rounds to run the check
DEFAULT:
|
split_sequence_handles
|
Set to true to split the handle sequences between layers evenly
DEFAULT:
|
universal_hamming
|
Set to true to compute the hamming distance for the entire set of slats
DEFAULT:
|
layer_hamming
|
Set to true to compute the hamming distance for the interface between each layer
DEFAULT:
|
group_hamming
|
Provide a dictionary, where the key is a group name and the value is a list of tuples containing the layer and slat ID of the slats in the group for which the specific hamming distance is being requested.
DEFAULT:
|
metric_to_optimize
|
The metric to optimize for (Universal, Layer X or Group ID)
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
2D array with handle IDs |
Older Tubular Slat Match Compute¶
Functions for calculating Hamming distances (our previous naming convention for parasitic interactions) and sequence metrics for tubular slat matching.
crisscross.slat_handle_match_evolver.tubular_slat_match_compute ¶
extract_handle_dicts ¶
Extracts all slats from a design and organizes them into dictionaries of handles and antihandles. If list_indices_only is set to True, the function will return lists of indices instead of the actual handle values.
| PARAMETER | DESCRIPTION |
|---|---|
handle_array
|
Array of XxYxZ-1 dimensions containing the IDs of all the handles in the design
|
slat_array
|
Array of XxYxZ dimensions containing the positions of all slats in the design
|
list_indices_only
|
Set to True to return lists of indices instead of the actual handle values
DEFAULT:
|
custom_index_map
|
A dictionary mapping (layer, slat_id) tuples to specific indices in the handle_array.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Two dictionaries or lists containing the handle sequences (or indices) for each slat - one for handles and one for antihandles |
oneshot_hamming_compute ¶
Given a dictionary of slat handles and antihandles, this function computes the hamming distance between all possible combinations. This is the fastest implementation available, making full use of Numpy's efficient vector computation.
| PARAMETER | DESCRIPTION |
|---|---|
handle_dict
|
Dictionary of handles i.e. {slat_id: slat_handle_array}
|
antihandle_dict
|
Dictionary of antihandles i.e. {slat_id: slat_antihandle_array}
|
slat_length
|
The length of a single slat (must be an integer)
|
| RETURNS | DESCRIPTION |
|---|---|
|
Array of noflank_results for each possible combination (a single integer per combination) |
multirule_oneshot_hamming ¶
multirule_oneshot_hamming(
slat_array,
handle_array,
report_worst_slat_combinations=True,
per_layer_check=False,
specific_slat_groups=None,
request_substitute_risk_score=False,
slat_length=32,
partial_area_score=False,
return_match_histogram=False,
)
Given a slat and handle array, this function computes the hamming distance of all handle/antihandle combinations provided. Scores for individual components, such as specific slat groups, can also be requested. Note: This function is our current fastest implementation. Due to its optimization, it cannot be instructed to compute lesser combinations of slats in the design - it will compute all hamming noflank_results at once.
Implementation comment: Requesting the similarity score requires that more hamming combinations are computed. This will of course slow down the function by a factor of 3 (approx). We tried to reduce this speed loss by combining the handles and antihandles into a duplicate array, which computes the following combinations: - handle vs handle - antihandle vs antihandle - handle vs antihandle (duplicated twice) However, the end result was a minor slowdown rather than speed increase. We suspect that the additional computations forced by the duplicated handle vs antihandle combination outweights the speed increase by computing the arrays all in one go. It might be possible to improve this further, but we do not think the solution is obvious (or potentially worth the time investment).
| PARAMETER | DESCRIPTION |
|---|---|
slat_array
|
Array of XxYxZ dimensions, where X and Y are the dimensions of the design and Z is the number of layers in the design
|
handle_array
|
Array of XxYxZ-1 dimensions containing the IDs of all the handles in the design
|
report_worst_slat_combinations
|
Set to true to provide the IDs of the worst handle/antihandle slat combinations
DEFAULT:
|
per_layer_check
|
Set to true to provide a hamming score for the individual layers of the design (i.e. the interface between each layer)
DEFAULT:
|
specific_slat_groups
|
Provide a dictionary, where the key is a group name and the value is a list of tuples containing the layer and slat ID of the slats in the group for which the specific hamming distance is being requested.
DEFAULT:
|
slat_length
|
The length of a single slat (must be an integer)
DEFAULT:
|
request_substitute_risk_score
|
Set to true to provide a measure of the largest amount of handle duplication between slats of the same type (handle or antihandle)
DEFAULT:
|
partial_area_score
|
Calculates Hamming distance and substitution risk among a subset of provided slats when only considering a subset of the handles. Provide a dictionary with key as a group name and the values as dictionarys with keys "handle" and "antihandle". The corresponding values are dictionaries, where the key is a tuple like so (slat layer, slat ID) and the value is a list of TRUE/FALSE depending on whether that position's handle is included.
DEFAULT:
|
return_match_histogram
|
If True, returns a histogram of the number of matches of each type (0 matches, 1 match, etc.).
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Dictionary of scores (or slat layer/handle IDS for the worst slat combinations) for each of the slat combinations requested from the design |
precise_hamming_compute ¶
Given a dictionary of slat handles and antihandles, this function computes the hamming distance between all possible combinations. It can be restricted to certain combinations by providing a list of valid product indices. This is not the fastest hamming implementation available, but does use numpy vectorization to reduce runtime.
| PARAMETER | DESCRIPTION |
|---|---|
handle_dict
|
Dictionary of handles i.e. {slat_id: slat_handle_array}
|
antihandle_dict
|
Dictionary of antihandles i.e. {slat_id: slat_antihandle_array}
|
valid_product_indices
|
A list of indices matching the possible products that should be computed (i.e. if a product is not being requested, the index should be False)
|
slat_length
|
The length of a single slat (must be an integer)
|
| RETURNS | DESCRIPTION |
|---|---|
|
Array of noflank_results for each possible combination (a single integer per combination) |
multirule_precise_hamming ¶
multirule_precise_hamming(
slat_array,
handle_array,
universal_check=True,
per_layer_check=False,
specific_slat_groups=None,
slat_length=32,
request_substitute_risk_score=False,
partial_area_score=False,
)
Given a slat and handle array, this function computes the hamming distance of all handle/antihandle combinations provided. Scores for individual components, such as specific slat groups, can also be requested. This does not use the fastest hamming calculation implementation, but can be configured to restrict the number of combinations computed.
| PARAMETER | DESCRIPTION |
|---|---|
slat_array
|
Array of XxYxZ dimensions, where X and Y are the dimensions of the design and Z is the number of layers in the design
|
handle_array
|
Array of XxYxZ-1 dimensions containing the IDs of all the handles in the design
|
universal_check
|
Set to true to provide a hamming score for the entire set of slats in the design
DEFAULT:
|
per_layer_check
|
Set to true to provide a hamming score for the individual layers of the design (i.e. the interface between each layer)
DEFAULT:
|
specific_slat_groups
|
Provide a dictionary, where the key is a group name and the value is a list of tuples containing the layer and slat ID of the slats in the group for which the specific hamming distance is being requested.
DEFAULT:
|
slat_length
|
The length of a single slat (must be an integer)
DEFAULT:
|
request_substitute_risk_score
|
Set to true to provide a measure of the largest amount of handle duplication between slats of the same type (handle or antihandle)
DEFAULT:
|
partial_area_score
|
Calculates Hamming distance and substitution risk among a subset of provided slats when only considering a subset of the handles. Provide a dictionary with key as a group name and the values as dictionaries with keys "handle" and "antihandle". The corresponding values are dictionaries, where the key is a tuple like so (slat layer, slat ID) and the value is a list of TRUE/FALSE depending on whether that position's handle is included.
DEFAULT:
|
| RETURNS | DESCRIPTION |
|---|---|
|
Dictionary of scores for each of the aspects requested from the design |
Optuna Integration (mostly deprecated, but can be revived if necessary)¶
Hyperparameter optimization using the Optuna framework (optional, mainly for debugging). Requires optuna to be installed.
crisscross.slat_handle_match_evolver.handle_evolve_with_optuna ¶
OptunaEvolveManager ¶
Bases: EvolveManager
An EvolveManager that is designed to be used with Optuna for optimizing the best hyperparameters. Inherits all other methods from the typical evolution manager.
run_full_experiment ¶
Runs a full evolution experiment.
| PARAMETER | DESCRIPTION |
|---|---|
logging_interval
|
The frequency at which logs should be written to file (including the best hamming array file).
DEFAULT:
|
single_evolution_step ¶
Performs a single evolution step, evaluating all candidate arrays and preparing new mutations for the next generation.
| RETURNS | DESCRIPTION |
|---|---|
|
|
ensure_pool ¶
Create a new Pool if none exists or it is not RUN. Safe to call at the beginning of each generation or on resume.