Working with Fragment Catalogs

Document Version: $Revision: 1.1 $ To start from scratch, the tool requires a CSV file with a SMILES column and an activity column. It's perfectly ok to have other columns as well, you specify these two columns using the --smiCol and --actCol arguments.

There are four steps to the process:

Build the fragment catalog, command line argument -b
This loops through a set of molecules and builds a fragment catalog containing all unique fragments found in the molecules.
Requirements:
- InData
Important arguments:
- -n: specifies the maximum number of molecules to be considered
- --catalog=[filename]: provides the name of the file to be used to store the pickled catalog.
Score molecules against the catalog, command line argument -s

Requirements:
- InData
- A Catalog
Important arguments:
- -n: specifies the maximum number of molecules to be considered
- --catalog=[filename]: provides the name of the file containing a pickled catalog.
- --scores=[filename]: provides the name of the file to be used to store the pickled compound scores
- --onbits=[filename]: provides the name of the file to be used for pickled OnBit lists (lists with the bits set by each molecule screened). Providing this option can save a lot of time.
Calculate information gains for the molecules, command line argument -g

Requirements:
- Scores
Important arguments:
- --scores=[filename]: provides the name of the file containing pickled compound scores
- --gains=[filename]: provides the name of the file to be used to store the gains (a csv file).
Display details about the fragments, command line argument -d

Requirements:
- Catalog
- Gains
Important arguments:
- --nBits=[value]: provide the maximum number of bits on which to report (they are presented in order of decreasing Gain).
- --catalog=[filename]: provides the name of the file containing pickled catalog
- --gains=[filename]: provides the name of the file containing the calculated gains (a CSV file)
- --details=[filename]: provides the name of the file to be used to store the details (a CSV file).