crux tide-index
Usage:
crux tide-index [options] <protein input file> <index name>
Description:
Tide is a tool for identifying peptides from tandem mass spectra. It is an independent reimplementation of the SEQUEST® algorithm, which assigns peptides to spectra by comparing the observed spectra to a catalog of theoretical spectra derived from a database of known proteins. Tide's primary advantage is its speed. Our published paper provides more detail on how Tide works. If you use Tide in your research, please cite:
Benjamin J. Diament and William Stafford Noble. “Faster SEQUEST Searching for Peptide Identification from Tandem Mass Spectra.” Journal of Proteome Research. 10(9):3871-9, 2011.The
tide-index
command performs a required pre-processing step on the protein database, converting it to a binary format suitable for input to thetide-search
command.Tide considers only the standard set of 20 amino acids. Peptides containing non-amino acid alphanumeric characters (BJOUXZ) are skipped. Non-alphanumeric characters are ignored completely.
Input:
- <protein input file> – The name of the file in FASTA format from which to retrieve proteins.
- <index name> – The desired name of the binary index.
Output:
The program creates a binary index, using the name specified on the command line.
In addition, the program writes the following files to the folder
crux-output
. The name of the output folder can be set by the user using the--output-dir
option. The following files will be created:
- tide-index.params.txt: a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the
--parameter-file
option for other crux programs.- tide-index.log.txt: a log file containing a copy of all messages that were printed to the screen during execution.
Options:
Peptide properties
--max-length <integer>
– Specify the maximum length of the peptides included in the index. Default=50.--min-length <integer>
– Specify the minimum length of the peptides included in the index. Default=6.--max-mass <float>
– Specify the maximum mass (in Da) of the peptides included in the index. Default=7200.--min-mass <float>
– Specify the minimum mass (in Da) of the peptides included in the index. Default=200.--monoisotopic-precursor <T|F>
– When computing the mass of a peptide, use monoisotopic masses rather than average mass values. Default = T.--clip-nterm-methionine <T|F>
– When set to T, for each protein that begins with methionine,tide-index
will put two copies of the leading peptide into the index, with and without the N-terminal methionine. Default = F.Amino acid modifications
--mods-spec C+57.02146,2M+15.9949,1STY+79.966331
,...The general form of a modification specification has three components, as exemplified by
.
1 STY +79.966331 The three components are:
[
.max_per_peptide ]residues [+/-]mass_change In the example,
is
max_per_peptide ,
1 are
residues , and
STY is
mass_change . To specify a static modification, the number preceding the amino acid must be omitted; i.e.,
+79.966331 specifies a static modification of 57.02146 Da to cysteine. Note that Tide allows at most one modification per amino acid. By default, the static C+57.02146 modification is turned on.
C +57.02146 --nterm-peptide-mods-spec 1E-18.0106,C-17.0265
,...,
--cterm-peptide-mods-spec X+21.9819
,...,
These parameters specify peptide n-terminal and c-terminal modifications, respectively. Like
--mods-spec
, these specifications have three components, but with a sightly different syntax. Thecan be either "1", in which case it defines a variable terminal modification, or missing, in which case the modification is static. The
max_per_peptide field indicates which amino acids are subject to the modification, with the residue
residues corresponding to any amino acid. Finally,
X is defined as before.
added_mass The first example above specifies a loss of 18 Da on the c-terminal amino acid. The second example specifies a 21.9819 Da c-terminal modification on any c-terminal amino acid.
Note that, in general, each amino acid can include at most one variable modification.
--max-mods <int>
– The maximum number of modifications that can be applied to a single peptide. Default = no limit.Decoy database generation
--decoy-format none|shuffle|peptide-reverse|protein-reverse
– Include a decoy version of every peptide by shuffling or reversing the target sequence. Inshuffle
orpeptide-reverse
mode, each peptide is either or reversed or shuffled, leaving the N-terminal and C-terminal amino acids in place. Note that peptides that appear multiple times in the target database are only shuffled once. Inpeptide-reverse
mode, palindromic peptides are shuffled. Also, if a shuffled peptide produces an overlap with the target or decoy database, then the peptide is re-shuffled up to 5 times. Note that, despite this repeated shuffling, homopolymers will appear in both the target and decoy database. Theprotein-reverse
mode reverses the entire protein sequence, irrespective of the composite peptides. Default=shuffle.--keep-terminal-aminos <N|C|NC|none>
– When creating decoy peptides using decoy-format=shuffle or decoy-format=peptide-reverse, this option specifies whether the N-terminal and C-terminal amino acids are kept in place or allowed to be shuffled or reversed. For a target peptide "EAMPK" with decoy-format=peptide-reverse, setting keep-terminal-amino-acids to "NC" will yield "EPMAK"; setting it to "C" will yield "PMAEK"; setting it to "N" will yield "EKPMA" and setting it to "none" will yield "KPMAE". Default = NC.--seed <integer>
– Set the seeed of the random number generator with the given unsigned integer. When given the string "time," the seed is set with the system time. Default=1.Enyzmatic digestion
--enzyme <string>
– Specify the enzyme used to digest the proteins in silico. Available enzymes (with the corresponding digestion rules indicated in parentheses) include no-enzyme ([X]|[X]), trypsin ([RK]|{P}), trypsin/p ([RK]|[]), chymotrypsin ([FWYL]|{P}, elastase ([ALIV]|{P}), clostripain ([R]|[]), cyanogen-bromide ([M]|[]), iodosobenzoate ([W]|[]), proline-endopeptidase ([P]|[]), staph-protease ([E]|[]), asp-n ([]|[D]), lys-c ([K]|{P}), lys-n ([]|[K]), arg-c ([R]|{P}), glu-c ([DE]|{P}), pepsin-a ([FL]|{P}), elastase-trypsin-chymotrypsin ([ALIVKRWFY]|{P}). Specifying--enzyme no-enzyme
yields a non-enzymatic digest. Warning: the resulting index may be quite large. Default=trypsin
.--custom-enzyme <residues before cleavage>|<residues after cleavage>
– Specify rules for in silico digestion of protein sequences. Overrides theenzyme
option. Two lists of residues are given enclosed in square brackets or curly braces and separated by a |. The first list contains residues required/prohibited before the cleavage site and the second list is residues after the cleavage site. If the residues are required for digestion, they are in square brackets, '[' and ']'. If the residues prevent digestion, then they are enclosed in curly braces, '{' and '}'. Use X to indicate all residues. For example, trypsin cuts after R or K but not before P which is represented as[RK]|{P}
. AspN cuts after any residue but only before D which is represented as[X]|[D]
.--digestion <full-digest|partial-digest>
– Specify whether every peptide in the database must have two enzymatic termini (full-digest
) or if peptides with only one enzymatic terminus are also included (partial-digest
). Default=full-digest
--missed-cleavages <integer>
– Maximum number of missed cleavages per peptide to allow in enzymatic digestion. If this option is not specified, then missed cleavages are not allowed.Input and output
--output-dir <filename>
– The name of the directory where output files will be created. Default = crux-output.--overwrite <T|F>
– Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.--peptide-list T|F
– Create in the output directory a text file listing all of the peptides in the database, along with their neutral masses, one per line. If decoys are generated, then a second file will be created containing the decoy peptides. Decoys that also appear in the target database are marked with an asterisk in a third column. Default = F.--parameter-file <filename>
– A file containing command-line or additional parameters. See the parameter documentation page for details. Default = no parameter file.--verbosity <0-100>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.