Working with Fragment Catalogs
Document Version: $Revision: 1.1 $
To start from scratch, the tool requires a CSV file with a SMILES
column and an activity column. It's perfectly ok to have other
columns as well, you specify these two columns using the
--smiCol and --actCol arguments.
There are four steps to the process:
- Build the fragment catalog, command line argument -b
This loops through a set of molecules and builds a fragment
catalog containing all unique fragments found in the molecules.
Requirements:
Important arguments:
- -n: specifies the maximum number of molecules to be considered
- --catalog=[filename]: provides the name of the file to be used to store the pickled catalog.
- Score molecules against the catalog, command line argument -s
Requirements:
Important arguments:
- -n: specifies the maximum number of molecules to be considered
- --catalog=[filename]: provides the name of the file containing a
pickled catalog.
- --scores=[filename]: provides the name of the file to be used to store
the pickled compound scores
- --onbits=[filename]: provides the name of the file to be used for
pickled OnBit lists (lists with the bits set by each molecule screened). Providing this
option can save a lot of time.
- Calculate information gains for the molecules, command line argument -g
Requirements:
Important arguments:
- --scores=[filename]: provides the name of the file containing pickled compound scores
- --gains=[filename]: provides the name of the file to be used to store
the gains (a csv file).
- Display details about the fragments, command line argument -d
Requirements:
Important arguments:
- --nBits=[value]: provide the maximum number of bits on which to report
(they are presented in order of decreasing Gain).
- --catalog=[filename]: provides the name of the file containing pickled catalog
- --gains=[filename]: provides the name of the file containing the
calculated gains (a CSV file)
- --details=[filename]: provides the name of the file to be used to store
the details (a CSV file).