crux q-ranker
Description:
Q-ranker dynamically learns to separate target from decoy PSMs. The algorithm is described in this article:
Marina Spivak, Jason Weston, Leon Bottou and William Stafford Noble. "Direct q value optimization methods for peptide identification from shotgun proteomics data sets." Journal of Proteome Research. 8(7):3737-3745, 2009.For more on q-values and posterior error probabilities (PEP), see the documentaion for calibrate-scores.
Usage:
crux q-ranker [options] <spectra> <search results>
Required Input:
Output:
- spectra – The fragmentation spectra must be provided in MS2, mzXML or MGF format. Like the database, the spectra can be specified in three different ways: (1) as a single file with suffix ".ms2", ".mzXML", or ".mgf" (2) as a text file containing a list of MS2 files or (3) as a directory in which all the spectrum files can be found.
- search results – q-ranker recognizes search results in the tab-delimited text format produced by Crux.
Each of the three required arguments can be provided in three different ways: (1) as a a single file, (2) as a text file containing a list of filenames, one per line, or (3) as a directory containing multiple files. File types are identified based on the filename extension: ".fa", ".fasta" or ".fsa" for FASTA files, ".ms2", ".mzXML" or ".mgf" for spectum files, and ".txt" for tab-delimited text files or lists of filenames. Note that the input mode for spectra and for search results must be the same; i.e., if you provide a list of files for the spectra, then you must also provide a list of files containing your search results. This mode is specified using the
--list-of-files
option, described below.Options:
The program writes files to the folder
crux-output
by default. The name of the output folder can be set by the user using the--output-dir
option. The following files will be created:- qranker.target.pep.xml : An xml file containing ranked list of target peptide-spectrum matches. The following entries are included:
- scan: the scan number
- charge: the inferred charge state
- psm_id: PSM identifier.
- q-value: The minimal PSM-level false discovery rate at which this PSM is deemed significant. This q-value is computed based on the ranking of the PSMs induced by the q-ranker score.
- score: The score assigned to the PSM by q-ranker. Higher values correspond to more confident identifications.
- precursor_mass: precrusor mass as recorded during the MS1 scan
- peptide: the peptide sequence
- filename: name of the file in which the PSM appears
- qranker.decoy.pep.xml: An xml file containing ranked list of decoy peptide-spectrum matches.
- qranker.target.psm.txt: a tab-delimited text format containing a ranked list of target peptide-spectrum matches with the associated Q-ranker scores and q-values.
- qranker.decoy.psm.txt: a tab-delimited text format containing a ranked list of decoy peptide-spectrum matches with the associated Q-ranker scores and q-values.
- qranker.log.txt: a file where the program reports its progress.
- qranker.params.txt: a file with the values of all the options given to the current run.
--enzyme trypsin|chymotrypsin|elastase
– The enzyme used to digest the proteins in the experiment. Default = trypsin.--decoy-prefix <string>
– specifies the prefix of the protein names that indicates a decoy. Default = rand_.--separate-searches <search results>
– If the target and decoy searches were run separately, rather than using a concatenated database, then Q-ranker will assume that the database search results provided as a required argument are from the target database search. This option then allows the user to specify the location of the decoy search results. Like the required arguments, these search results can be provided as a single file, a list of files or a directory. However, the choice (file, list or directory) must be consistent for the MS2 files and the target and decoy search results files. Also, if the MS2 and search results files are provided in directories, then Q-ranker will use the MS2 filename (foo.ms2
) to identify corresponding target and decoy search result files with names likefoo*.target.txt
andfoo*.decoy.txt
. This naming convention allows the target and decoy tab-delimited files to reside in the same directory.--fileroot <string>
– Thefileroot
string will be added as a prefix to all output file names. Default = none.--output-dir <directory>
– The name of the directory where output files will be created. Default = crux-output.--overwrite <T/F>
– The option applies to the situation when the output directory specified for the run already exists. If set to T, Q-ranker will overwrite the contents of the output directory specified for the run. Default = F.--skip-cleanup <T/F>
– Q-ranker analysis begins with a pre-processsing step that creates a set of lookup tables which are then used during training. Normally, these lookup tables are deleted at the end of the Q-ranker analysis, but setting this option toT
prevents the deletion of these tables. Subsequently, the Q-ranker analysis can be repeated more efficiently by specifying the--re-run
option (see below). Default = F.--re-run <directory>
– Re-run a previous Q-ranker analysis using a previously computed set of lookup tables. For this option to work, the--skip-cleanup
must have been set to true when Q-ranker was run the first time.--use-spec-features <T/F>
– Q-ranker uses enriched feature set derived from the spectra in ms2 files. It can be forced to use minimal feature set by setting the --use-spec-features option to F. Default T.--parameter-file <filename>
– A file containing command-line or additional parameters. See the parameter documentation page for details. Default = no parameter file.--feature-file <T|F>
– Optional file into which psm features are printed. Default = F.--spectrum-parser pwiz|mstoolkit
– Specify the parser to use for reading in MS/MS spectra. The default, ProteoWizard parser should be able to read the MS/MS file formats listed here. The alternative is MSToolkit parser. If the ProteoWizard parser fails to read your files properly, you may want to try the MSToolkit parser instead. Default = pwiz.--list-of-files <T|F>
– Specify that the spectra and search results are provided as lists of files, rather than as individual files. When the spectrum files and the database search results files are provided via a file listing, q-ranker assumes that the order of the spectrum files matches the order of the search result files. Alternatively, when the spectrum files and search results files are provided via directories, q-ranker will search for pairs of files with the same root name but different extensions (".ms2", ".mzXML" or ".mgf" for the spectrum file and ".txt" for the search results). Default = F.--verbosity <0-100>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.--txt-output <T|F>
– Output a tab-delimited results file to the output directory. Default = T.--pepxml-output <T|F>
– Output a pepXML results file to the output directory. Default = F.Selected Examples of Use:
- Search was done against concatenated target-decoy database.
- To analyse single files, with decoy prefix random_ (the default for the decoy prefix is set to decoy_). The output is in the directory crux-output by default:
crux q-ranker --decoy-prefix random_ spectra.ms2 matches.txt
- Assume that directory spectra-dir contains 10 ms2 files, and directory matches-dir constains the 10 tab-delimited text files, corresponding to searches against a single target-decoy database, with decoy prefix reverse_. To analyse lists of ms2 and tab-delimited files in the directores, overwriting previous results:
crux q-ranker --decoy-prefix reverse_ --overwrite T spectra-list.txt matches-list.txt
- To specify directories with ms2 and tab-delimited files, and the output directory called results-dir:
crux q-ranker --decoy-prefix reverse_ --output-dir results-dir spectra-dir matches-dir
- Q-ranker uses reach feature set by default. When analysing large dataset, it may be desirable to speed up the feature extraction step and to revert to minimal feature set:
crux q-ranker --decoy-prefix reverse_ --output-dir results --overwrite T --use-spec-features F spectra-list.txt matches-list.txt
- Separate searches were done against target and decoy databases.
The searches against the target database are provided as the required < search results > argument. To specify the searches against the decoy database, use --separate-searches option.
- To analyse a single "ms2" file and "txt" files matches-target.txt and matches-decoy.txt:
crux q-ranker --decoy-prefix random_ --use-separate-searches matches-decoy.txt spectra.ms2 matches-target.txt
- If ms2 files are in spectra-dir directory, tab-delimited files are in matches-target-dir and matches-decoy-dir:
crux q-ranker --decoy-prefix random_ --use-separate-searches matches-decoy-dir spectra-dir matches-target-dir
- If ms2 files are listed in spectra-list.txt, tab-delimited files are listed in matches-target-list.txt and matches-decoy-list.txt:
crux q-ranker --decoy-prefix random_ --use-separate-searches matches-decoy-list.txt spectra-list.txt matches-target-list.txt
- Running Q-ranker with Crux database search
- Please use crux search-for-matches because it produces database search results in tab-delimited text format, compatible with q-ranker input (see search-for-matches).
Crux home