CLI Options
============= 

| 

**Usage**: ``scrub.py INPUT -o OUTPUT_FILE.sdf [options]``

|

.. option::  INPUT 

    Input specification of molecule to be generated. This can be a SMILES string,
    an ``.sdf`` or ``.mol`` MDL MolFile, with or without existing coordinates, or a list 
    of SMILES (in a ``.smi`` file, separated by newlines). ``cxsmiles`` is also supported


Basic Options
---------------

.. option:: -o, --out_fname <OUTPUT_FILENAME.sdf : str>

    The output path/filename of the generated molecule, 
    in ``.sdf`` format. 

.. option:: --write_failed_mols <FILENAME.sdf : str>

    Log molecules that fail for whatever reason.

.. option:: --name_from_prop <PROP_NAME : str>

    set molecule name from an RDKit or SDF property. 

.. option:: --ph <ph_value : float>

    set the pH value for which the protonation state will be 
    determined. If you want to select a range, use ``--ph_low`` and 
    ``--ph_high``. See also ``--pka_model``. 

.. option:: --skip_acidbase 

    skip the generation of different protonation states. By default
    appropriate protonation states will be generated. Relevant options: 
    ``--ph``, ``--ph_low/--ph_high``, ``--pka_model``, ``--pka_fname``. 

.. option:: --skip_tautomers 

    skip the generation of different tautomer states. By default, certain
    tautomers will be enumerated. See also ``--tauto_fname``. 

.. option:: --skip_ringfix 

    skip several fixes that are applied to aliphatic rings. This is a big
    part of molscrub, so it's not advisable to use this option. See also: 
    ``--ring_minimize``, ``--energy_threshold``, ``--ff``, ``--max_ff_iter``. 


.. option:: --skip_gen3d 

    skips the generation of 3d structures altogether (including ringfix). 

.. option:: --keep_all_frags

    If the molecule has more than one fragment, keep all of them rather
    than just the largest fragment (default). 

Acid/Base Options
------------------

.. option:: --ph_low/--ph_high <VALUE : float>

    set a low and high end range for the pH that will determine 
    the protonation states of the molecule. See also ``--ph`` and 
    ``--pka_model``. 

.. option:: --pka_fname <FILENAME : str>

    a file with SMARTS reactions for acid-base reactions. See 
    ``molscrub/data/pka_reactions.txt`` for an example. This 
    incompatible with ``pka_model=etr1``

.. option:: --pka_model [rules, etr1]

    select pka and protonation model to use. ``rules`` selects a 
    rules-based approach where pKa values for several reaction types
    are defined in  ``molscrub/data/pka_reactions.txt`` (or a custom file
    if ``--pka_fname`` is used). ``etr1`` is a hybrid method that uses 
    predifined rules and a tree-based ML model for pKa prediction. ETR1 is 
    more accurate, but slightly more expensive when multiple protonation 
    sites are involved. 

    For now, ``rules`` is the default, though we expect the ML-based approach
    to become the default in the future. 

.. option:: --tauto_fname <FILENAME : str>

    a file with SMARTS reactions for enumerating tautomeric states. 
    See  ``molscrub/data/tautomers.txt`` for an example. 

Geometry Options 
------------------

.. option:: --skip_etkdg 

    Use existing 3D coordinates whenever possible. 

    Normally, the ETKDG module of RDKit is used to generate the initial
    3D coordinates of the molecule. If the user supplies an input (e.g. ``.sdf`` file) 
    with 3D coordinates, this option allows Molscrub to use those coordinates as much
    as possible, using constrained rather than full embedding. 

.. option:: --num_internal_confs <CONFS : int>

    Number of initial conformers generated by ETKDG (default=3). 
    Note that this does not change the number of molecules produced, only the 
    initial conformations, of which the best is selected using heuristics or 
    FF optimization.

.. option:: --min_output_confs <CONFS : int>

    Minimum number of output conformers for each protonation/tautomeric state. 

.. option:: --etkdg_rng_seed <SEED : int> 

    random seed number for ETKDG (useful for reproducibility)

.. option:: --ff [uff, mmff94, mmff94s, espaloma]

    Choose the forcefield that molscrub uses to geometry optimization. 
    This can affect the final geometry, but also the determination of ideal
    isomers during `ringfix` if ``--ring_minimize`` is used. 

.. option:: --max_ff_iter <ITERS : int> 

    Maximum number of steps during FF optimization. Note that FFs may be 
    used at several steps during the process if the ``--ring_minimize`` 
    option is used. 

.. option:: --template <FILENAME : str>

    ``.sdf`` or ``.mol`` file with 3D coordinates to constrain the 
    ETKDG embedding. See also ``--skip_etkdg``.  

.. option:: --template_smarts <SMARTS_PATTERN : str>

    SMARTs pattern matching atoms of template provided by the ``--template`` option. Essentially, 
    select which atoms to preserve in the constrained embedding. 

.. option:: --use_random_coords 

    Use random coordinates for more robust (but slower) ETKDG emebedding. 

.. option:: --num_etkdg_attempts <num : int>

    Number of ETKDG conformer generation attempts. Useful for large or unusual molecules
    where ETKDG struggles to generate conformers. 

.. option:: --ring_minimize 

    Use FF instead of heuristics to determine the optimal ring conformer (i.e. lower 
    energy conformers are selected). 

    The accuracy of this method will depend on the choice of forcefield. 
    So far it is kind of hit or miss. 

.. option:: --energy_threshold <THRESHOLD : float>

    If ``--ring_minimize`` is chosen, select an energy threhold by which to select
    the optimal confomer (default 0.5 kcal/mol). If the energy difference is below
    the threshold, both conformers are kept. 


Misc Options 
--------------

.. option:: -h / --help 

    Show the help message with all the CLI options

.. option:: --cpu <CPUS : int>

    Number of cpus to run in parallel. Note that parallelization happens across multiple 
    molecules.

.. option:: --debug 

    Show errors and other helpful debugging messages. 

.. option:: --wcg 

    Ensure that molecules names and suffixes are integers

.. option:: --charge_model [espaloma, nagl]

    Add partial charges to the output sdf file using either the espaloma 
    or the NAGL model. 