crux search-for-xlinks
Usage:Description:crux search-for-xlinks [options] <ms2 input filename> <protein database> <link sites> <link mass>
This command searches a protein database with a set of spectra. For each spectrum, the precursor mass is computed from either the precursor singly charged mass (m+h) or the mass-to-charge (m/z) and an assumed charge. Candidates molecules are linear peptides, dead-end products, self-loop products or cross-linked products whose mass lies within a specified range of the precursor mass. These candidate peptides are ranked using XCorr. The input protein database is in FASTA format.
The algorithm is described in more detail in the following article:
Sean McIlwain, Paul Draghicescu, Pragya Singh, David R. Goodlett and William Stafford Noble. "Detecting cross-linked peptides by searching against a database of cross-linked peptide pairs." Journal of Proteome Research. 2010.Modifications: Currently,
crux search-for-xlinks
supports static modifications (a change of mass applied to a given amino acid in every peptide in which it occurs). By default, a static modification of +57 Da to cysteine (C) is applied. Variable modifications (allowing peptides to be generated with and without a mass change to a given amino acid), are supported whenuse-old-xlink=F
. Static and variable modifications can be specified in the parameter file, as described below.Input:
Output:
- <ms2 > – The name of the file from which to parse the MS/MS spectra. The file can be in any format supported by ProteoWizard.
- <protein database> – The name of the file in Fasta format from which to retrieve proteins and peptides.
- <link sites> – A comma delimited list of the amino acids to allow cross-links with. For example, "A:K,A:D" means that the cross linker can attach A to K or A to D. Cross-links involving the N-terminus of a protein can be specified as a link site by using "nterm." For example, "nterm:K" means that a cross-link can attach a protein's N-terminus to a lysine.
- <link mass> – The mass modification of the linker when attached to a peptide.
The program writes files to the folder
crux-output
by default. The name of the output folder can be set by the user using the--output-dir
option. The following files will be created:
- search-for-xlinks.params.txt: a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the
--parameter-file
option for other crux programs.- search-for-xlinks.target.txt: a tab-delimited text file containing the peptide-spectrum matches (PSMs). See the txt file format for a list of the fields.
- search-for-xlinks.decoy.txt: a tab-delimited text file containing the decoy PSMs. See the txt file format for a list of the fields.
- search-for-xlinks.qvalues.txt: a tab-delimited text file containing the top ranked PSMs with calculated q-values. See the txt file format for a list of the fields.
- search-for-xlinks.log.txt: a log file containing a copy of all messages that were printed to stderr.
Options:
Cross-linking parameters
--use-old-xlink T|F
– Use the old version of xlink searching algorithm. When False, a new version of the code is run. The new version supports variable modifications and can handle more complex databases. This new code is still in development and should be considered a beta release. Default=T.--xlink-include-linears T|F
– Include linear peptides in the search. Default = T.--xlink-include-deadends T|F
– Include deadend products in the search. Default = T.--xlink-include-selfloops T|F
– Include selfloops in the search. Default = T.--xlink-prevents-cleavage <string>
– List of amino acids for which the cross-linker can prevent cleavage.--max-xlink-mods <integer>
– Specify the maximum number of modifications allowed on a crosslinked peptide. This option is only available whenuse-old-xlink=F.
Peptide properties
--min-mass <float>
– The minimum neutral mass of the peptides to place in the index. Default = 200.--max-mass <float>
– The maximum neutral mass of the peptides to place in index. Default = 7200.--min-length <int>
– The minimum length of the peptides to place in the index. Default = 4.--max-length <int>
– The maximum length of the peptides to place in the index. Default = 50.Amino acid modifications
--mod <mass change>:<aa list>:<max per peptide>:<prevents cleavage>:<prevents cross-link>
– Consider modifications on any amino acid inaa list
with at mostmax-per-peptide
in one peptide. This parameter may be included with different values multiple times so long as the total number ofmod
,cmod
, andnmod
parameters does not exceed 11. Theprevents cleavage
andprevents cross-link
are T/F optional arguments for describing whether the modification prevent enzymatic cleavage or cross-linking respectively. Default = no variable modifications. This option is only available whenuse-old-xlink=F.
--max-mods <int>
– The maximum number of modifications that can be applied to a single peptide. Default = no limit. This option is only available whenuse-old-xlink=F.
--<A-Z> <float>
– Specify static modifications. This is a mass change applied to the given amino acid (in single-letter-code A thru Z) for every peptide in which it occurs. Use themod
option for generating peptides both with and without the mass change. Default C=57.0214637206.Enzymatic digestion
--enzyme <string>
– Specify the enzyme used to digest the proteins in silico. Available enzymes (with the corresponding digestion rules indicated in parentheses) include no-enzyme ([X]|[X]), trypsin ([RK]|{P}), trypsin/p ([RK]|[]), chymotrypsin ([FWYL]|{P}, elastase ([ALIV]|{P}), clostripain ([R]|[]), cyanogen-bromide ([M]|[]), iodosobenzoate ([W]|[]), proline-endopeptidase ([P]|[]), staph-protease ([E]|[]), asp-n ([]|[D]), lys-c ([K]|{P}), lys-n ([]|[K]), arg-c ([R]|{P}), glu-c ([DE]|{P}), pepsin-a ([FL]|{P}), elastase-trypsin-chymotrypsin ([ALIVKRWFY]|{P}). Specifying--enzyme no-enzyme
yields a non-enzymatic digest. Warning: the resulting index may be quite large. Default=trypsin
.--custom-enzyme <residues before cleavage>|<residues after cleavage>
– Specify rules for in silico digestion of protein sequences. Overrides theenzyme
option. Two lists of residues are given enclosed in square brackets or curly braces and separated by a |. The first list contains residues required/prohibited before the cleavage site and the second list is residues after the cleavage site. If the residues are required for digestion, they are in square brackets, '[' and ']'. If the residues prevent digestion, then they are enclosed in curly braces, '{' and '}'. Use X to indicate all residues. For example, trypsin cuts after R or K but not before P which is represented as[RK]|{P}
. AspN cuts after any residue but only before D which is represented as[X]|[D]
.--digestion full-digest|partial-digest
– Degree of digestion used to generate peptides (full-digest, partial-digest). Either both ends or one end of a peptide must conform to enzyme specificity rules. Default full-digest., Used in conjunction with enzyme option when enzyme is not set to to 'no-enzyme'.--missed-cleavages <int>
– Allow missed cleavage sites within a peptide. When used with enzyme is specified; includes peptides containing one or more potential cleavage sites. Default = 0.Search parameters
--spectrum-min-mz <float>
– The lowest spectrum m/z to search in the ms2 file. Default = 0.0.--spectrum-max-mz <float>
– The highest spectrum m/z to search in the ms2 file. Default = no maximum.--spectrum-charge 1|2|3|all
– The spectrum charges to search. With 'all' every spectrum will be searched, and spectra with multiple charge states will be searched once at each charge state. With 1, 2, or 3 only spectra with that charge will be searched. Default = all.--compute-sp T|F
– Compute the preliminary Sp score for all candidate peptides. This is recommended if results are to be analyzed bypercolator
orq-ranker
. Default = F.--precursor-window <float>
– Tolerance used for matching peptides to spectra. Peptides must be within +/- 'precursor-window
' of the spectrum mass. Definition of precursor window depends uponprecursor-window-type
. Default = 3.0.--precursor-window-type mass|mz|ppm
– Specify the units for the window that is used to select peptides around the precursor mass location (mass, mz, ppm). The magnitude of the window is defined by theprecursor-window
option, and candidate peptides must fall within this window. For themass
window-type, the spectrum precursor m+h value is converted to mass, and the window is defined as that mass ±precursor-window
. If the m+h value is not available, then the mass is calculated from the precursor m/z and provided charge. The peptide mass is computed as the sum of the average amino acid masses plus 18 Da for the terminal OH group. Themz
window-type calculates the window as spectrum precursor m/z ±precursor-window
and then converts the resulting m/z range to the peptide mass range using the precursor charge. For the parts-per-million (ppm
) window-type, the spectrum mass is calculated as in themass
type. The lower bound of the mass window is then defined as the spectrum mass / (1.0 + (precursor-window
/ 1000000)) and the upper bound is defined as spectrum mass / (1.0 - (precursor-window
/ 1000000)). Default = mass.--precursor-window-weibull <0-1e6>
– Score decoy peptides within +/-precursor-window-weibull
of the precursor mass. The resulting scores are used only for fitting the Weibull distribution. Default = 20.--precursor-window-type-weibull mass|mz|ppm
– Window type to use in conjunction with theprecursor-window-weibull
parameter. Default=mass.--min-weibull-points <int>
– Keep reshuffling and collecting XCorr scores until the minimum number of points for weibull fitting (using targets and decoys) is achieved. Default = 4000.--max-ion-charge <int>
– Predict ions for the theoretical spectra up to max charge state (1,2,...,6) or up to the charge state of the peptide (peptide). If the max-ion-charge is greater than the charge state of the peptide, then the max is the peptide charge. Default='peptide'.--scan-number <int>|<int>-<int>
– A single scan number or a range of numbers to be searched. Range should be specified as 'first-last' which will include scans 'first' and 'last'. Default = search all spectra.--mz-bin-width <float>
– Before calculation of the XCorr score, the m/z axes of the observed and theoretical spectra are discretized. This parameter specifies the size of each bin. The exact formula is floor((x/mz-bin-width) + 1.0 - mz-bin-offset), where x is the observed m/z value. By default, the mz-bin-width is 1.0005079 Da when searching using monoisotopic mass and 1.0011413 Da with average mass.--mz-bin-offset <float>
– In the discretization of the m/z axes of the observed and theoretical spectra, this parameter specifies the location of the left edge of the first bin, relative to mass = 0 (i.e., mz-bin-offset = 0.xx means the left edge of the first bin will be located at +0.xx Da). The parameter must lie in the range 0 ≤ mz-bin-offset ≤ 1. Default=0.40.--mod-mass-format mod-only|total|separate
– Specify how sequence modifications are reported in various ouptut files. Each modification is reported as a number enclosed in square braces following the modified reside; however, the number may correspond to one of three different masses: (1) 'mod-only' reports the value of the mass shift induced by the modification; (2) 'total' reports the mass of the residue with the modification (residue mass plus modification mass); (3) 'separate' is the same as 'mod-only', but multiple modifications to a single amino acid are reported as a comma-separated list of values. For example, suppose amino acid D has an unmodified mass of 115 as well as two modifications of masses +14 and +2. In this case, the amino acid would be reported as D[16] with 'mod-only', D[131] with 'total', and D[14,2] with 'separate'.--use-flanking-peaks T|F
– Turn on or off the peaks flanking the b/y ions. Forcrux search-for-matches
, default = F; forcrux search-for-xlinks
, default = T.--xcorr-use-flanks T|F
– Use flanking ions in the theoretical spectrum. These are placed +/- 1 Da around the b-y ions, with intensity of 25.0. Default = T.--fragment-mass average|mono
– Which isotopes to use in calcuating fragment ion mass (average, mono). Default = average.--isotopic-mass average|mono
– Specify the type of isotopic masses to use when calculating the peptide mass. Default = average.--isotope-windows <-n,-n-1,0,1,n>
– Provides a list of isotopic windows to search. For example, -1,0,1 will search in three disjoint windows: (1) precursor_mass - neutron_mass ± window, (2) precursor_mass ± window, and (3) precursor_mass + neutron_mass ± window. The window sized is defined from theprecursor-window
and precursor-window-type parameters. This option is only available whenuse-old-xlink=F.
--compute-p-values T|F
– Estimate the paramters of the score distribution for each spectrum by fitting to a Weibull distribution, and compute a p-value for each xlink product. This option is only available whenuse-old-xlink=F
. Default = F.Input and output
--spectrum-parser pwiz|mstoolkit
– Specify the parser to use for reading in MS/MS spectra. The default, ProteoWizard parser should be able to read the MS/MS file formats listed here. The alternative is the MSToolkit parser. If the ProteoWizard parser fails to read your files properly, you may want to try the MSToolkit parser instead. Default = pwiz.--top-match <int>
– The number of PSMs per spectrum written to the output files. Default = 5.--output-dir <filename>
– The name of the directory where output files will be created. Default = crux-output.--overwrite T|F
Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.--parameter-file <filename>
– A file containing parameter settings. See the parameter documentation page for details.--verbosity <0-100>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.