CLI Options#
Usage: scrub.py INPUT -o OUTPUT_FILE.sdf [options]
- INPUT#
Input specification of molecule to be generated. This can be a SMILES string, an
.sdfor.molMDL MolFile, with or without existing coordinates, or a list of SMILES (in a.smifile, separated by newlines).cxsmilesis also supported
Basic Options#
- -o, --out_fname <OUTPUT_FILENAME.sdf : str>#
The output path/filename of the generated molecule, in
.sdfformat.
- --write_failed_mols <FILENAME.sdf : str>#
Log molecules that fail for whatever reason.
- --name_from_prop <PROP_NAME : str>#
set molecule name from an RDKit or SDF property.
- --ph <ph_value : float>#
set the pH value for which the protonation state will be determined. If you want to select a range, use
--ph_lowand--ph_high. See also--pka_model.
- --skip_acidbase#
skip the generation of different protonation states. By default appropriate protonation states will be generated. Relevant options:
--ph,--ph_low/--ph_high,--pka_model,--pka_fname.
- --skip_tautomers#
skip the generation of different tautomer states. By default, certain tautomers will be enumerated. See also
--tauto_fname.
- --skip_ringfix#
skip several fixes that are applied to aliphatic rings. This is a big part of molscrub, so it’s not advisable to use this option. See also:
--ring_minimize,--energy_threshold,--ff,--max_ff_iter.
- --skip_gen3d#
skips the generation of 3d structures altogether (including ringfix).
- --keep_all_frags#
If the molecule has more than one fragment, keep all of them rather than just the largest fragment (default).
Acid/Base Options#
- --ph_low/--ph_high <VALUE : float>#
set a low and high end range for the pH that will determine the protonation states of the molecule. See also
--phand--pka_model.
- --pka_fname <FILENAME : str>#
a file with SMARTS reactions for acid-base reactions. See
molscrub/data/pka_reactions.txtfor an example. This incompatible withpka_model=etr1
- --pka_model [rules, etr1]#
select pka and protonation model to use.
rulesselects a rules-based approach where pKa values for several reaction types are defined inmolscrub/data/pka_reactions.txt(or a custom file if--pka_fnameis used).etr1is a hybrid method that uses predifined rules and a tree-based ML model for pKa prediction. ETR1 is more accurate, but slightly more expensive when multiple protonation sites are involved.For now,
rulesis the default, though we expect the ML-based approach to become the default in the future.
- --tauto_fname <FILENAME : str>#
a file with SMARTS reactions for enumerating tautomeric states. See
molscrub/data/tautomers.txtfor an example.
Geometry Options#
- --skip_etkdg#
Use existing 3D coordinates whenever possible.
Normally, the ETKDG module of RDKit is used to generate the initial 3D coordinates of the molecule. If the user supplies an input (e.g.
.sdffile) with 3D coordinates, this option allows Molscrub to use those coordinates as much as possible, using constrained rather than full embedding.
- --num_internal_confs <CONFS : int>#
Number of initial conformers generated by ETKDG (default=3). Note that this does not change the number of molecules produced, only the initial conformations, of which the best is selected using heuristics or FF optimization.
- --min_output_confs <CONFS : int>#
Minimum number of output conformers for each protonation/tautomeric state.
- --etkdg_rng_seed <SEED : int>#
random seed number for ETKDG (useful for reproducibility)
- --ff [uff, mmff94, mmff94s, espaloma]#
Choose the forcefield that molscrub uses to geometry optimization. This can affect the final geometry, but also the determination of ideal isomers during ringfix if
--ring_minimizeis used.
- --max_ff_iter <ITERS : int>#
Maximum number of steps during FF optimization. Note that FFs may be used at several steps during the process if the
--ring_minimizeoption is used.
- --template <FILENAME : str>#
.sdfor.molfile with 3D coordinates to constrain the ETKDG embedding. See also--skip_etkdg.
- --template_smarts <SMARTS_PATTERN : str>#
SMARTs pattern matching atoms of template provided by the
--templateoption. Essentially, select which atoms to preserve in the constrained embedding.
- --use_random_coords#
Use random coordinates for more robust (but slower) ETKDG emebedding.
- --num_etkdg_attempts <num : int>#
Number of ETKDG conformer generation attempts. Useful for large or unusual molecules where ETKDG struggles to generate conformers.
- --ring_minimize#
Use FF instead of heuristics to determine the optimal ring conformer (i.e. lower energy conformers are selected).
The accuracy of this method will depend on the choice of forcefield. So far it is kind of hit or miss.
- --energy_threshold <THRESHOLD : float>#
If
--ring_minimizeis chosen, select an energy threhold by which to select the optimal confomer (default 0.5 kcal/mol). If the energy difference is below the threshold, both conformers are kept.
Misc Options#
- -h / --help#
Show the help message with all the CLI options
- --cpu <CPUS : int>#
Number of cpus to run in parallel. Note that parallelization happens across multiple molecules.
- --debug#
Show errors and other helpful debugging messages.
- --wcg#
Ensure that molecules names and suffixes are integers
- --charge_model [espaloma, nagl]#
Add partial charges to the output sdf file using either the espaloma or the NAGL model.