Skip to content

Assembly Handle Optimization

The slat handle match evolver module provides access to our evolutionary algorithm for intelligently assigning assembly handles to a megastructure design. The below are the main functions and classes available in this module.

Handle Evolution

Core evolutionary algorithm implementation for handle sequence evolution.

crisscross.slat_handle_match_evolver.handle_evolution

EvolveManager

EvolveManager(
    megastructure: Megastructure,
    random_seed=8,
    generational_survivors=3,
    mutation_rate=5,
    mutation_type_probabilities=(0.425, 0.425, 0.15),
    unique_handle_sequences=32,
    evolution_generations=200,
    evolution_population=30,
    split_sequence_handles=False,
    sequence_split_factor=2,
    process_count=None,
    early_max_valency_stop=None,
    log_tracking_directory=None,
    progress_bar_update_iterations=2,
    mutation_memory_system="off",
    memory_length=10,
    repeating_unit_constraints=None,
    similarity_score_calculation_frequency=10,
    update_scope="interfaces",
)

Prepares an evolution manager to optimize a handle array for the provided slat array. WARNING: Make sure to use the "if name == 'main':" block to run this class in a script. Otherwise, the spawned processes will cause a recursion error.

PARAMETER DESCRIPTION
megastructure

Megastructure object containing slat array and optionally a seed handle array for starting evolution

TYPE: Megastructure

random_seed

Random seed to use to ensure consistency

DEFAULT: 8

generational_survivors

Number of surviving candidate arrays that persist through each generation

DEFAULT: 3

mutation_rate

The expected number of mutations per iteration

DEFAULT: 5

mutation_type_probabilities

Probability of selecting a specific mutation type for a target handle/antihandle (either handle, antihandle or mixed mutations)

DEFAULT: (0.425, 0.425, 0.15)

unique_handle_sequences

Handle library length

DEFAULT: 32

evolution_generations

Number of generations to consider before stopping

DEFAULT: 200

evolution_population

Number of handle arrays to mutate in each generation

DEFAULT: 30

split_sequence_handles

Set to true to enforce the splitting of handle sequences between subsequent layers

DEFAULT: False

sequence_split_factor

Factor by which to split the handle sequences between layers (default is 2, which means that if handles are split, the first layer will have 1/2 of the handles, etc.)

DEFAULT: 2

process_count

Number of threads to use for hamming multiprocessing (if set to default, will use 67% of available cores)

DEFAULT: None

early_max_valency_stop

If this worst match score is achieved, the evolution will stop early

DEFAULT: None

log_tracking_directory

Set to a directory to export plots and metrics during the optimization process (optional)

DEFAULT: None

progress_bar_update_iterations

Number of iterations before progress bar is updated - useful for server output files, but does not seem to work consistently on every system (optional)

DEFAULT: 2

mutation_memory_system

The type of memory system to use for the handle mutation process. Options are 'all', 'best', 'special', or 'off'.

DEFAULT: 'off'

memory_length

Memory of previous 'worst' handle combinations to retain when selecting positions to mutate.

DEFAULT: 10

repeating_unit_constraints

Dictionary to define handle link constraints on handle mutations (mainly for use with repeating unit designs). This is largely depracted now in favour of the Megastructure recursive handle linking system.

DEFAULT: None

similarity_score_calculation_frequency

The duplication risk score will be calculated every x generations. This helps speed up the evolution process, since it isn't used for deciding on the best generations to retain.

DEFAULT: 10

update_scope

Either 'interfaces' (default) to only place handles at layer interfaces, or 'all' to also place handles at positions with existing handles in the seed array.

DEFAULT: 'interfaces'

initialize_evolution

initialize_evolution()

Initializes the pool of candidate handle arrays.

single_evolution_step

single_evolution_step()

Performs a single evolution step, evaluating all candidate arrays and preparing new mutations for the next generation.

RETURNS DESCRIPTION

run_full_experiment

run_full_experiment(
    logging_interval=10, suppress_handle_array_export=False, save_first=False
)

Runs a full evolution experiment.

PARAMETER DESCRIPTION
logging_interval

The frequency at which logs should be written to file (including the best hamming array file).

DEFAULT: 10

suppress_handle_array_export

If true, the best handle array will not be exported at each logging interval, saving space.

DEFAULT: False

save_first

If true, saves the best handle array from the initial random population as 'best_handle_array_initial.xlsx'.

DEFAULT: False

ensure_pool

ensure_pool()

Create a new Pool if none exists or it is not RUN. Safe to call at the beginning of each generation or on resume.

Handle Mutation

Mutation operators for genetic algorithm optimization.

crisscross.slat_handle_match_evolver.handle_mutation

mutate_handle_arrays

mutate_handle_arrays(
    slat_array,
    candidate_handle_arrays,
    hallofshame,
    best_score_indices,
    unique_sequences=32,
    memory_hallofshame=None,
    memory_best_parent_hallofshame=None,
    special_hallofshame=None,
    mutation_rate=2.0,
    mutation_type_probabilities=(0.425, 0.425, 0.15),
    use_memory_type=None,
    split_sequence_handles=False,
    sequence_split_factor=2,
    repeating_unit_constraints=None,
    mutation_mask=None,
)

Mutates (randomizes handles) a set of candidate arrays into a new generation, while retaining the best scoring arrays from the previous generation.

PARAMETER DESCRIPTION
slat_array

Base slat array for design

candidate_handle_arrays

Set of candidate handle arrays from previous generation

hallofshame

Worst handle/antihandle combinations from previous generation

best_score_indices

The indices of the best scoring arrays from the previous generation

unique_sequences

Total length of handle library available

DEFAULT: 32

memory_hallofshame

List of all the worst handle/antihandle combinations from previous generations

DEFAULT: None

memory_best_parent_hallofshame

List of the worst handle/antihandle combinations from previous generations (only the ones linked to the best parents)

DEFAULT: None

special_hallofshame

List of special handle/antihandle combinations from previous generations

DEFAULT: None

mutation_rate

The expected number of mutations per cycle

DEFAULT: 2.0

mutation_type_probabilities

Probability of selecting a specific mutation type for a target handle/antihandle (either handle, antihandle or mixed mutations)

DEFAULT: (0.425, 0.425, 0.15)

use_memory_type

Select memory type to use for mutation selection ('off', 'all', 'best_memory', 'special')

DEFAULT: None

split_sequence_handles

Set to true if the handle library needs to be split between subsequent layers

DEFAULT: False

sequence_split_factor

The number of layers to split the handle library between (default is 2, which means a single layer would have half the available library)

DEFAULT: 2

repeating_unit_constraints

Dictionary of handles to link together (mostly deprecated in favour of recursive algorithm in Megastructure). Syntax is {(layer, 'top'/'bottom', (x,y)): (layer, 'top'/'bottom', (x,y))}

DEFAULT: None

mutation_mask

Optional integer mask to inform mutation system of any handle links in the slat design (created through recursion system)

DEFAULT: None

RETURNS DESCRIPTION

New generation of handle arrays to be screened

Utility Functions

Utility functions for generating random sequences.

crisscross.slat_handle_match_evolver

generate_random_slat_handles

generate_random_slat_handles(
    base_array, unique_sequences=32, link_handles={}, additional_positions=None
)

Generates an array of handles, all randomly selected.

PARAMETER DESCRIPTION
base_array

Megastructure handle positions in a 3D array

unique_sequences

Number of possible handle sequences

DEFAULT: 32

link_handles

Dictionary of handles to link together (mostly deprecated in favour of recursive algorithm in Megastructure). Syntax is {(layer, 'top'/'bottom', (x,y)): (layer, 'top'/'bottom', (x,y))}

DEFAULT: {}

additional_positions

Optional set of (x, y, z) tuples indicating additional positions where handles should be placed beyond interface positions

DEFAULT: None

RETURNS DESCRIPTION

2D array with handle IDs

generate_layer_split_handles

generate_layer_split_handles(
    base_array,
    unique_sequences=32,
    split_factor=2,
    link_handles={},
    additional_positions=None,
)

Generates an array of handles, with the possible ids split between each layer, with the goal of preventing a single slat from being self-complementary.

PARAMETER DESCRIPTION
base_array

Megastructure handle positions in a 3D array

unique_sequences

Number of possible handle sequences

DEFAULT: 32

split_factor

Number of layers to split the handle sequences between

DEFAULT: 2

link_handles

Dictionary of handles to link together (mostly deprecated in favour of recursive algorithm in Megastructure). Syntax is {(layer, 'top'/'bottom', (x,y)): (layer, 'top'/'bottom', (x,y))}

DEFAULT: {}

additional_positions

Optional set of (x, y, z) tuples indicating additional positions where handles should be placed beyond interface positions

DEFAULT: None

RETURNS DESCRIPTION

2D array with handle IDs

update_split_slat_handles

update_split_slat_handles(handle_array, unique_sequences=32)

Updates the split handle array with new random values inplace

PARAMETER DESCRIPTION
handle_array

Pre-populated split handle array

unique_sequences

Max number of unique sequences

DEFAULT: 32

RETURNS DESCRIPTION

N/A

update_random_slat_handles

update_random_slat_handles(handle_array, unique_sequences=32)

Updates the handle array with new random values inplace

PARAMETER DESCRIPTION
handle_array

Pre-populated handle array

unique_sequences

Max number of unique sequences

DEFAULT: 32

RETURNS DESCRIPTION

N/A

Random Hamming Optimizer

Random search baseline algorithm.

crisscross.slat_handle_match_evolver.random_hamming_optimizer

generate_handle_set_and_optimize

generate_handle_set_and_optimize(
    base_array,
    unique_sequences=32,
    slat_length=32,
    max_rounds=30,
    split_sequence_handles=False,
    universal_hamming=True,
    layer_hamming=False,
    group_hamming=None,
    metric_to_optimize="Universal",
)

Generates random handle sets and attempts to choose the best set based on the hamming distance between slat assembly handles.

PARAMETER DESCRIPTION
base_array

Slat position array (3D)

unique_sequences

Max unique sequences in the handle array

DEFAULT: 32

slat_length

Length of a single slat

DEFAULT: 32

max_rounds

Maximum number of rounds to run the check

DEFAULT: 30

split_sequence_handles

Set to true to split the handle sequences between layers evenly

DEFAULT: False

universal_hamming

Set to true to compute the hamming distance for the entire set of slats

DEFAULT: True

layer_hamming

Set to true to compute the hamming distance for the interface between each layer

DEFAULT: False

group_hamming

Provide a dictionary, where the key is a group name and the value is a list of tuples containing the layer and slat ID of the slats in the group for which the specific hamming distance is being requested.

DEFAULT: None

metric_to_optimize

The metric to optimize for (Universal, Layer X or Group ID)

DEFAULT: 'Universal'

RETURNS DESCRIPTION

2D array with handle IDs

Older Tubular Slat Match Compute

Functions for calculating Hamming distances (our previous naming convention for parasitic interactions) and sequence metrics for tubular slat matching.

crisscross.slat_handle_match_evolver.tubular_slat_match_compute

extract_handle_dicts

extract_handle_dicts(
    handle_array, slat_array, list_indices_only=False, custom_index_map=None
)

Extracts all slats from a design and organizes them into dictionaries of handles and antihandles. If list_indices_only is set to True, the function will return lists of indices instead of the actual handle values.

PARAMETER DESCRIPTION
handle_array

Array of XxYxZ-1 dimensions containing the IDs of all the handles in the design

slat_array

Array of XxYxZ dimensions containing the positions of all slats in the design

list_indices_only

Set to True to return lists of indices instead of the actual handle values

DEFAULT: False

custom_index_map

A dictionary mapping (layer, slat_id) tuples to specific indices in the handle_array.

DEFAULT: None

RETURNS DESCRIPTION

Two dictionaries or lists containing the handle sequences (or indices) for each slat - one for handles and one for antihandles

oneshot_hamming_compute

oneshot_hamming_compute(handle_dict, antihandle_dict, slat_length)

Given a dictionary of slat handles and antihandles, this function computes the hamming distance between all possible combinations. This is the fastest implementation available, making full use of Numpy's efficient vector computation.

PARAMETER DESCRIPTION
handle_dict

Dictionary of handles i.e. {slat_id: slat_handle_array}

antihandle_dict

Dictionary of antihandles i.e. {slat_id: slat_antihandle_array}

slat_length

The length of a single slat (must be an integer)

RETURNS DESCRIPTION

Array of noflank_results for each possible combination (a single integer per combination)

multirule_oneshot_hamming

multirule_oneshot_hamming(
    slat_array,
    handle_array,
    report_worst_slat_combinations=True,
    per_layer_check=False,
    specific_slat_groups=None,
    request_substitute_risk_score=False,
    slat_length=32,
    partial_area_score=False,
    return_match_histogram=False,
)

Given a slat and handle array, this function computes the hamming distance of all handle/antihandle combinations provided. Scores for individual components, such as specific slat groups, can also be requested. Note: This function is our current fastest implementation. Due to its optimization, it cannot be instructed to compute lesser combinations of slats in the design - it will compute all hamming noflank_results at once.

Implementation comment: Requesting the similarity score requires that more hamming combinations are computed. This will of course slow down the function by a factor of 3 (approx). We tried to reduce this speed loss by combining the handles and antihandles into a duplicate array, which computes the following combinations: - handle vs handle - antihandle vs antihandle - handle vs antihandle (duplicated twice) However, the end result was a minor slowdown rather than speed increase. We suspect that the additional computations forced by the duplicated handle vs antihandle combination outweights the speed increase by computing the arrays all in one go. It might be possible to improve this further, but we do not think the solution is obvious (or potentially worth the time investment).

PARAMETER DESCRIPTION
slat_array

Array of XxYxZ dimensions, where X and Y are the dimensions of the design and Z is the number of layers in the design

handle_array

Array of XxYxZ-1 dimensions containing the IDs of all the handles in the design

report_worst_slat_combinations

Set to true to provide the IDs of the worst handle/antihandle slat combinations

DEFAULT: True

per_layer_check

Set to true to provide a hamming score for the individual layers of the design (i.e. the interface between each layer)

DEFAULT: False

specific_slat_groups

Provide a dictionary, where the key is a group name and the value is a list of tuples containing the layer and slat ID of the slats in the group for which the specific hamming distance is being requested.

DEFAULT: None

slat_length

The length of a single slat (must be an integer)

DEFAULT: 32

request_substitute_risk_score

Set to true to provide a measure of the largest amount of handle duplication between slats of the same type (handle or antihandle)

DEFAULT: False

partial_area_score

Calculates Hamming distance and substitution risk among a subset of provided slats when only considering a subset of the handles. Provide a dictionary with key as a group name and the values as dictionarys with keys "handle" and "antihandle". The corresponding values are dictionaries, where the key is a tuple like so (slat layer, slat ID) and the value is a list of TRUE/FALSE depending on whether that position's handle is included.

DEFAULT: False

return_match_histogram

If True, returns a histogram of the number of matches of each type (0 matches, 1 match, etc.).

DEFAULT: False

RETURNS DESCRIPTION

Dictionary of scores (or slat layer/handle IDS for the worst slat combinations) for each of the slat combinations requested from the design

precise_hamming_compute

precise_hamming_compute(
    handle_dict, antihandle_dict, valid_product_indices, slat_length
)

Given a dictionary of slat handles and antihandles, this function computes the hamming distance between all possible combinations. It can be restricted to certain combinations by providing a list of valid product indices. This is not the fastest hamming implementation available, but does use numpy vectorization to reduce runtime.

PARAMETER DESCRIPTION
handle_dict

Dictionary of handles i.e. {slat_id: slat_handle_array}

antihandle_dict

Dictionary of antihandles i.e. {slat_id: slat_antihandle_array}

valid_product_indices

A list of indices matching the possible products that should be computed (i.e. if a product is not being requested, the index should be False)

slat_length

The length of a single slat (must be an integer)

RETURNS DESCRIPTION

Array of noflank_results for each possible combination (a single integer per combination)

multirule_precise_hamming

multirule_precise_hamming(
    slat_array,
    handle_array,
    universal_check=True,
    per_layer_check=False,
    specific_slat_groups=None,
    slat_length=32,
    request_substitute_risk_score=False,
    partial_area_score=False,
)

Given a slat and handle array, this function computes the hamming distance of all handle/antihandle combinations provided. Scores for individual components, such as specific slat groups, can also be requested. This does not use the fastest hamming calculation implementation, but can be configured to restrict the number of combinations computed.

PARAMETER DESCRIPTION
slat_array

Array of XxYxZ dimensions, where X and Y are the dimensions of the design and Z is the number of layers in the design

handle_array

Array of XxYxZ-1 dimensions containing the IDs of all the handles in the design

universal_check

Set to true to provide a hamming score for the entire set of slats in the design

DEFAULT: True

per_layer_check

Set to true to provide a hamming score for the individual layers of the design (i.e. the interface between each layer)

DEFAULT: False

specific_slat_groups

Provide a dictionary, where the key is a group name and the value is a list of tuples containing the layer and slat ID of the slats in the group for which the specific hamming distance is being requested.

DEFAULT: None

slat_length

The length of a single slat (must be an integer)

DEFAULT: 32

request_substitute_risk_score

Set to true to provide a measure of the largest amount of handle duplication between slats of the same type (handle or antihandle)

DEFAULT: False

partial_area_score

Calculates Hamming distance and substitution risk among a subset of provided slats when only considering a subset of the handles. Provide a dictionary with key as a group name and the values as dictionaries with keys "handle" and "antihandle". The corresponding values are dictionaries, where the key is a tuple like so (slat layer, slat ID) and the value is a list of TRUE/FALSE depending on whether that position's handle is included.

DEFAULT: False

RETURNS DESCRIPTION

Dictionary of scores for each of the aspects requested from the design

Optuna Integration (mostly deprecated, but can be revived if necessary)

Hyperparameter optimization using the Optuna framework (optional, mainly for debugging). Requires optuna to be installed.

crisscross.slat_handle_match_evolver.handle_evolve_with_optuna

OptunaEvolveManager

OptunaEvolveManager(optuna_trial, **kwargs)

Bases: EvolveManager

An EvolveManager that is designed to be used with Optuna for optimizing the best hyperparameters. Inherits all other methods from the typical evolution manager.

run_full_experiment

run_full_experiment(logging_interval=10)

Runs a full evolution experiment.

PARAMETER DESCRIPTION
logging_interval

The frequency at which logs should be written to file (including the best hamming array file).

DEFAULT: 10

initialize_evolution

initialize_evolution()

Initializes the pool of candidate handle arrays.

single_evolution_step

single_evolution_step()

Performs a single evolution step, evaluating all candidate arrays and preparing new mutations for the next generation.

RETURNS DESCRIPTION

ensure_pool

ensure_pool()

Create a new Pool if none exists or it is not RUN. Safe to call at the beginning of each generation or on resume.