EvoMD is an evolutionary optimization framework for peptide sequences. It evolves a population of peptides based on a user-defined fitness function evaluated via simulation modules written and plugged in by user. The configuration is stored in a single YAML file, and the state is serialized to evolver.pkl after every step.
The general process begins with a random population that is continuously evaluated and sorted. The worst performers are discarded, and the population is renewed through crossover of the best sequences. The result is an optimized population of sequences.
Python 3.10+ required.
pip install numpy pyyaml matplotlibThis example uses test_hm.py (included), which maximizes the hydrophobic moment of 20-residue peptides with no real simulation.
1. Create an input.yaml:
optimize: maximize # maximize or minimize
population: 32 # how many sequences in the population?
peptide_len: 20 # peptide lenght
populate_method: swap # hybrids and swap work better
extra_mutation: True # Random point mutation
also_mutate_probability: 0.1 # 10% of the new sequences are mutated
parents_ratio: 0.25 # Parents are chosen from the 25% of the population
# User must write the name of the python script with the external methods.
# These are external methods.
constructor: test_hm # script for contructing simulation box.
calculator: test_hm # running and checking simulations.
analyzer: test_hm # Analysis and computation of fitness values.
sleep_time: 0
max_check_cycle: 50
max_generations: 50
hydrophobic_restriction: False # change to True for
hydrophobic_min: 5.5 # ignored while hydrophobic_restriction is False
hydrophobic_max: None # no max limit
2. Create the Evolver:
python evo-md.py --file input.yaml --create-evolver3. Run the evolution:
python evo-md.py --file input.yaml --start4. Inspect the results:
python evo-md.py --show-evolver
python evo-md.py --report-sequences # writes sequences_report.csv
python evo-md.py --plot-evolution --show-kids # plots and saves as evolution.pngEach generation EvoMD:
- Constructs the simulation box (constructor_method).
- Runs the simulation (calculator_method).
- Checks if simulations finished (calculator_check).
- Analyzes results and computes fitness (analyzer_method).
- Sorts sequences, keeps the best as parents, and generates the next generation.
Users have the control of steps 1–4 by writing a Python plug-in with four functions and calling it in the YAML (see next section). EvoMD handles the rest.
The general idea of the modular architecture is shown in the next figure:
Users must create Python modules with these four functions and name it in the YAML (constructor, calculator, analyzer). All three keys can point to the same file.
calculator_method and calculator_check must be in the same python file.
def constructor_method(sequence) -> None:
# Write input files, build the system, etc.
pass
def calculator_method(sequence) -> None:
# Launch the job (submit, start a process, etc.)
pass
def calculator_check(sequence) -> bool:
# Return True when the job is done, False otherwise.
# Polled repeatedly by EvoMD until True or the cycle budget runs out.
return True
def analyzer_method(sequence) -> float:
# Parse results and return the fitness value.
return 0.0Each function runs inside the sequence's iteration directory. The sequence object gives the residue string (str(sequence)), physicochemical properties (sequence.charge, sequence.residues, …), and state flags (sequence.is_failed to mark a failure). You can attach arbitrary attributes to carry information between functions (e.g. a job ID set in calculator_method and read in calculator_check).
See test_hm.py for a short example.
| What you want to do | Command |
|---|---|
| Create a new Evolver and exit | python evo-md.py --file input.yaml --create-evolver |
| Start the evolution loop | python evo-md.py --file input.yaml --start |
| Resume an interrupted iteration | python evo-md.py --restart |
| Stop after the current iteration | python evo-md.py --stop-evolver |
| Show the current state | python evo-md.py --show-evolver |
| Export all sequences to CSV | python evo-md.py --report-sequences |
| Plot fitness over generations | python evo-md.py --plot-evolution |
| Roll back to the last complete generation | python evo-md.py --last-generation |
| Show the loaded configuration | python evo-md.py --show-current |
Run python evo-md.py --help for the full flag reference.
| File | Description |
|---|---|
evolver.pkl |
Full Evolver state, updated after each step. |
sequences_report.csv |
All sequences with generation and fitness. |
evolution.png |
Fitness plot (written by --plot-evolution). |
simulation_data/<SEQUENCE>/iter_N/ |
Per-sequence, per-attempt directories where your methods run. |
If a run is interrupted:
python evo-md.py --restartIf the last generation is incomplete or just want to go back to the last completed generation:
python evo-md.py --last-generationThen continue with --start.
Use the old version to create a sequence report in CSV format.
python scripts_old/evo-md.py --report-sequencesUpdate input.yaml with new variable names (some variables were refactores for better comprehention).
Create evolver.pkl and read report with the new version.
python scripts_new/evo-md.py --create-evolver --file input.yaml --read-report sequence_report.csvStart evolution with the new version
python scripts_new/evo-md.py --startYou can run evo-md using a LUMI container wrapper. The next instructions were adapted from LUMI Documentation. https://docs.lumi-supercomputer.eu/software/installing/container-wrapper/
Load the LUMI container module.
module load LUMI
module load lumi-container-wrapperCreate a conda environment file (e.g. env.yml).
# --- env.yml ---
channels:
- conda-forge
dependencies:
# Dependencies for evo-md
- python>=3.10
- numpy
- pyyaml
- matplotlib
# Include dependencies for your
# external methods (constructor, calculator, analyzer)
# e.g. scipy, pymol, vermouth, biopython, etc.
- pymol-open-source
- vermouth
Create the directory for the container (e.g. env/) and create the container.
mkdir env
conda-containerize new --prefix env env.ymlExecute python from the container.
/users/USER/env/bin/python /users/USER/modular_evomd/evo-md.py --helpYou can populate Evolver from the json files created after each generation. First step is to create a new evolver using an input file with an adequate configuration (be sure that peptide_len is equal to the length of the sequences in the backup).
Then populate from backup:
python evo-md.py --from-backupThis creates sequences from json files in simulation directory and sort the sequences. The result is an evolver with choosen parents ready to start.
python evo-md.py --start
