crux percolator
Usage:
crux percolator [options] <protein input> <search results directory>
Description:
Percolator is a semi-supervised learning algorithm that dynamically learns to separate target from decoy PSMs. The algorithm is described in this article:
Lukas Käll, Jesse Canterbury, Jason Weston, William Stafford Noble and Michael J. MacCoss. "Semi-supervised learning for peptide identification from shotgun proteomics datasets." Nature Methods. 4(11):923-925, 2007.For more on q-values and posterior error probabilities (PEP), see the documentaion for calibrate-scores.
Crux now includes code from Percolator 1.05. The current version of Percolator can be downloaded here. External versions of Percolator can be applied to the output of the Crux tools, but won't be used by the
crux percolator
command.Percolator requires as input two collections of PSMs, one set derived from matching observed spectra against real ("target") peptides, and a second derived from matching against "decoy" peptides. Crux generates these decoys on the fly. Percolator will also accept a second set of decoy PSMs and use one set in training and one in calculating q-values. Producing two sets of decoy PSMs is the search-for-matches default behavior.
The features used by percolator to represent each PSM are summarized here.
Input:
Output:
- <protein input > – The name of the file in fasta format or the directory containing the protein index from which to retrieve proteins and peptides.
- <search results directory> – A folder in which all the PSM result files are located. The program looks for files produced by
crux search-for-matches
(ending in '.search.target.txt
' or '.search.decoy.txt
') . All such files in the given directory are analyzed jointly. Note that the directory should not contain results from both types of search algorithms.Options:
The program writes files to the folder
crux-output
by default. The name of the output folder can be set by the user using the--output-dir
option. The following files will be created:- percolator.params.txt: a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the
--parameter-file
option for other crux programs.- percolator.target.txt: a tab-delimited text file containing the PSMs. See txt file format for a list of the fields.
- percolator.target.pep.xml: a file containing the PSMs in pepxml format. See pep xml file format for further reference. This file can be used as input to some of the tools in the Transproteomic Pipeline.
- percolator.log.txt: a log file containing a copy of all messages that were printed to stderr.
--pi-zero <value>
– The estimated proportion of target scores that are drawn according to the null distribution. Default=1.0.mod-mass-format <mod-only|total|separate>
– Specify how sequence modifications are reported in various ouptut files. Each modification is reported as a number enclosed in square braces following the modified reside; however, the number may correspond to one of three different masses: (1) 'mod-only' reports the value of the mass shift induced by the modification; (2) 'total' reports the mass of the residue with the modification (residue mass plus modification mass); (3) 'separate' is the same as 'mod-only', but multiple modifications to a single amino acid are reported as a comma-separated list of values. For example, suppose amino acid D has an unmodified mass of 115 as well as two modifications of masses +14 and +2. In this case, the amino acid would be reported as D[16] with 'mod-only', D[131] with 'total', and D[14,2] with 'separate'.
--fileroot <string>
– Thefileroot
string will be added as a prefix to all output file names. Default = none.--output-dir <filename>
– The name of the directory where output files will be created. Default = crux-output.--overwrite <T|F>
Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.--parameter-file <filename>
– A file containing command-line or additional parameters. See the parameter documentation page for details.--feature-file <string>
– Optional file in which to write the features. Default = none.--verbosity <0-100>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.
Crux home