crux sequest-search
Usage:Description:crux sequest-search [options] <ms2 input filename> <protein input>
This command searches a protein database with a set of spectra, using an algorithm similar to that employed by the SEQUEST database search software. This command differs from the
crux search-for-matches
command in the following respects:
- Candidate peptides (i.e., peptides that lie within the user-specified precursor window) are first ranked with the SEQUEST Sp score. The top 500 matches are then re-scored using XCorr.
- The program will not compute p-values using the Weibull empirical curve fitting procedure.
- By default, the theoretical spectrum used to compute XCorr includes two flanking peaks on either side of each b- and y-ion. These can be turned off using the
use-flanking-peaks
parameter.- The program produces SQT format output files.
Similar to
crux search-for-matches
, the input protein database may either be in FASTA format or it may be a binary index created bycrux create-index
. Using an index will typically yield much faster search speeds.This command handles modifications in the same way as
crux search-for-matches
.Input:
Output:
- <ms2 > – The name of the file (in MS2 format) from which to parse the spectra.
- <protein input> – The name of the file in fasta format or the directory containing a protein index from which to retrieve proteins and peptides.
The program writes files to the folder
crux-output
by default. The name of the output folder can be set by the user using the--output-dir
option. The following files will be created:
- sequest.params.txt: a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the
--parameter-file
option for other crux programs.- sequest.target.sqt: an SQT file containing the PSMs.
- sequest.target.txt: a tab-delimited text file containing the PSMs. See txt file format for a list of the fields. This and the sequest.decoy.txt file(s) can be used as input for
crux
post-search functions such asq-ranker
.- sequest.log.txt: a log file containing a copy of all messages that were printed to stderr.
If decoys are enabled using
Options:--num-decoys-per-target
, then search.decoy.sqt and search.decoy.txt are also produced.Parameter file options:
--fileroot <string>
– Thefileroot
string will be added as a prefix to all output file names. Default = none.--output-dir <filename>
– The name of the directory where output files will be created. Default = crux-out.--overwrite <T|F>
– Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.--decoys <string>
– Include a decoy version of every peptide by shuffling or reversing the target sequence. Possible values arenone, reverse, protein-shuffle, peptide-shuffle
. Use 'reverse' to reverse each protein sequence, 'protein-shuffle' to shuffle each protein sequence, or 'peptide-shuffle' to shuffle the sequence between enzyme cleavage sites, leaving the termini in place. Use 'none' for no decoys. Default=peptide-shuffle.--num-decoys-per-target <n>
– Specify the number of decoy peptides to search for every target peptide searched. Control where the decoys are returned (to what files) with--decoy-location
. At least one decoy set (in its own file) is required to run the algorithm 'percolator' in a subsequent crux run. Default = 2.--decoy-location <target-file | one-decoy-file | separate-decoy-files>
– File(s) in which decoy results are returned. Only applies whennum-decoys-per-target
is not zero. Use 'target-file' to mix target and decoy psms in one file. Use 'one-decoy-file' to print target psms to one file and all decoys to a separate file. Use 'separate-decoy-files' to print one .txt file for each decoy set. (crux percolator
accepts up to two search.decoy.txt files.crux q-ranker
accepts only one search.decoy.txt file.) Default = separate-decoy-files.--spectrum-parser pwiz|mstoolkit
– Specify the parser to use for reading in MS/MS spectra. The default, ProteoWizard parser should be able to read the MS/MS file formats listed here. The alternative is MSToolkit parser. If the ProteoWizard parser fails to read your files properly, you may want to try the MSToolkit parser instead. Default = pwiz.--spectrum-min-mz <float>
– The lowest spectrum m/z to search in the ms2 file. Default = 0.0--spectrum-max-mz <float>
– The highest spectrum m/z to search in the ms2 file. Default = no maximum.--spectrum-charge <1|2|3|all>
– The spectrum charges to search. With 'all' every spectrum will be searched and spectra with multiple charge states will be searched once at each charge state. With 1, 2, or 3 only spectra with that charge will be searched. Default = all.--scan-number <number range>
– A single scan number or a range of numbers to be searched. Range should be specified as 'first-last' which will include scans 'first' and 'last'. Default = search all spectra.--parameter-file <filename>
– A file containing command-line or additional parameters. See the parameter documentation page for details. Default = no parameter file.--verbosity <0-100>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.
--min-peaks <int>
– The minimum number of peaks a spectrum must have in order to be searched. Default = 20.fragment-mass <average|mono>
– Which isotopes to use in calculating fragment ion mass (average, mono). Default = mono.use-flanking-peaks <T|F>
– Turn on or off the peaks flanking the b/y ions. Forcrux search-for-matches
, default = F; forcrux sequest-search
, default = T; forcrux search-for-xlinks
, default = T.precursor-window <float>
– Tolerance used for matching peptides to spectra. Peptides must be within +/- 'precursor-window
' of the spectrum value. Definition of precursor window units depends uponprecursor-window-type
. Default = 3.0.precursor-window-type <mass|mz|ppm>
– Specify the units for the window that is used to select peptides around the precursor mass location (mass, mz, ppm). The magnitude of the window is defined by theprecursor-window
option, and candidate peptides must fall within this window. For themass
window-type, the spectrum precursor singly charged mass (m+h) is converted to mass, and the window is defined as that mass ±precursor-window
. If the m+h value is not available, then the mass is calculated from the precursor mass-to-charge (m/z) and provided charge. The peptide mass is computed as the sum of the average amino acid masses plus 18 Da for the terminal OH group. Themz
window-type calculates the window as spectrum precursor m/z ±precursor-window
and then converts the resulting m/z range to the peptide mass range using the precursor charge. For the parts-per-million (ppm
) window-type, the spectrum mass is calculated as in themass
type. The lower bound of the mass window is then defined as the spectrum mass / (1.0 + (precursor-window
/ 1000000)) and the upper bound is defined as spectrum mass / (1.0 - (precursor-window
/ 1000000)). Default = mass.top-match <int>
– The number of psms per spectrum written to the output files. Default = 5.max-rank-preliminary <int>
– The number of PSMs to score with XCorr after ranking PSMs by Sp. Default = 500.mod <mass change>:<aa list>:<max per peptide>:<prevents cleavage>:<prevents cross-link>
– Consider modifications on any amino acid in aa list with at most max-per-peptide in one peptide. This parameter may be included with different values multiple times so long as the total number ofmod
,cmod
, andnmod
parameters does not exceed 11. Theprevents cleavage
andprevents cross-link
are T/F optional arguments for describing whether the modification prevent enzymatic cleavage or cross-linking respectively. The same modifications must be given for any post-search process (crux compute-q-values
,crux q-ranker
,crux percolator
). Default = no variable modifications.cmod <mass change>:<max distance from protein C-terminus>
– Consider modifications on the C-terminus of any peptide whose C-terminus is no more than max-distance residues from the protein C-terminus. Use -1 to consider the C-terminus of all peptides regardless of position in the protein. This parameter may be included with different values multiple times so long as the total number ofmod
,cmod
, andnmod
parameters does not exceed 11. The same modifications must be given for any post-search process (crux compute-q-values
,crux q-ranker
,crux percolator
). Default = no c-terminal modifications.nmod <mass change>:<max distance from protein N-terminus> –
Consider modifications on the N-terminus of any peptide whose N-terminus is no more than max-distance residues from the protein N-terminus. Use -1 to consider the N-terminus of all peptides regardless of position in the protein. This parameter may be included with different values multiple times so long as the total number ofmod
,cmod
, andnmod
parameters does not exceed 11. The same modifications must be given for any post-search process (crux compute-q-values
,crux q-ranker
,crux percolator
). Default = no n-terminal modifications.cmod-fixed <mass change>–
Add a modification of the given mass change to the C-terminus of every peptide.nmod-fixed <mass change>–
Add a modification of the given mass change to the N-terminus of every peptide.max-mods <int>
– The maximum number of modifications that can be applied to a single peptide. Default = no limit.max-aas-modified <int>
– The maximum number of modified amino acids that can appear in one peptide. Each aa can be modified multiple times. Default = no limit.mod-mass-format <mod-only|total|separate>
– Specify how sequence modifications are reported in various ouptut files. Each modification is reported as a number enclosed in square braces following the modified reside; however, the number may correspond to one of three different masses: (1) 'mod-only' reports the value of the mass shift induced by the modification; (2) 'total' reports the mass of the residue with the modification (residue mass plus modification mass); (3) 'separate' is the same as 'mod-only', but multiple modifications to a single amino acid are reported as a comma-separated list of values. For example, suppose amino acid D has an unmodified mass of 115 as well as two modifications of masses +14 and +2. In this case, the amino acid would be reported as D[16] with 'mod-only', D[131] with 'total', and D[14,2] with 'separate'.
precision <int>
– Set the precision (number of significant digits) for scores written to text files. Default = 8.print-search-progress <int>
– Show search progress by printing every n spectra searched. Set to 0 to show no search progress. Default = 10.reverse-sequence <T|F>
– Generate decoy sequences by reversing the peptide rather than by shuffling. The first and last residues of the sequence are not changed. If the target sequence is a palindrome (the same when reversed), then the decoy will be generated by shuffling and a note to that effect will be printed at verbosity level 40 (DETAILED INFO). Default = generate decoys by shuffling.NOTE: the following parameters are also used when creating an index and must be compatible with any index used.
min-mass <float>
– The minimum neutral mass of the peptides to place in the index. Default = 200.max-mass <float>
– The maximum neutral mass of the peptides to place in index. Default = 7200.min-length <int>
– The minimum length of the peptides to place in the index. Default = 6.max-length <int>
– The maximum length of the peptides to place in the index. Default = 50.--enzyme trypsin|trypsin/p|chymotrypsin|elastase|clostripain|cyanogen-bromide|idosobenzoate|proline-endopeptidase|staph-protease|asp-n|lys-c|lys-n|arg-c|glu-c|pepsin-a|elastase-trypsin-chymotrypsin|no-enzyme
– Enzyme to use for in silico digestion of protein sequences. Used in conjunction with thedigestion
andmissed-cleavages
options. Use 'no-enzyme' for non-specific digestion. Digestion rules are as follows: enzyme name [cuts after one of these residues]|{but not before one of these residues}. trypsin [RK]|{P}, trypsin/p [RK]|[], elastase [ALIV]|{P}, chymotrypsin [FWYL]|{P}, clostripain [R]|[], cyanogen-bromide [M]|[], iodosobenzoate [W]|[], proline-endopeptidase [P]|[], staph-protease [E]|[], elastase-trypsin-chymotrypsin [ALIVKRWFY]|{P}, asp-n []|[D], lys-c [K]|{P}, lys-n []|[K], arg-c [R]|{P}, glu-c [DE]|{P}, pepsin-a [FL]|{P}. Default = trypsin.custom-enzyme <residues before cleavage | residues after cleavage >
&ndash Specify rules for in silico digestion of protein sequences. Overrides theenzyme
option. Two lists of residues are given enclosed in square brackets or curly braces and separated by a |. The first list contains residues required/prohibited before the cleavage site and the second list is residues after the cleavage site. If the residues are required for digestion, they are in square brackets, '[' and ']'. If the residues prevent digestion, then they are enclosed in curly braces, '{' and '}'. Use X to indicate all residues. For example, trypsin cuts after R or K but not before P which is represented as[RK]|{P}
. AspN cuts after any residue but only before D which is represented as[X]|[D]
.digestion <full-digest|partial-digest>
– Degree of digestion used to generate peptides . Either both ends (full-digest) or at least one end (partial-digest) of a peptide must conform to enzyme specificity rules. Used in conjunction with theenzyme
orcustom-enzyme
option whenenzyme
is not set to to 'no-enzyme'. Default full-digest.missed-cleavages <T|F>
– Allow missed cleavage sites within a peptide. When used withenzyme
and set to true, includes peptides containing one or more potential cleavage sites. Default = F.isotopic-mass <average|mono>
– Specify the type of isotopic masses to use when calculating the peptide mass. Default = average.<A-Z> <float>
– Specify static modifications. This is a mass change applied to the given amino acid (in single-letter-code A thru Z) for every peptide in which it occurs. Use themod
option for generating peptides both with and without the mass change. Default C=57.