crux create-index
Usage:Description:crux create-index [options] <protein input file> <index name>
Given a protein fasta sequence database as input, generate an index of all of its peptides and save it to disk. Optionally, the index may also include decoy peptides to be used for post-processing commands such ascrux percolator
,crux q-ranker
crux compute-q-values
orcrux barista
. The index can be provided as input tocrux search-for-matches
instead of the fasta file to speed up the search.Input:
Output:
- <protein input file> – The name of the file (in fasta format) from which to parse proteins.
- <index name> – The name of the directory where the newly created index will be placed.
Options:
The program writes four types of files to the specified output directory: binary index files, a binary sequence file, and a text file with settings, a fasta file with decoy protein sequences to be used during search. The binary sequence file will have the same name as the input fasta file, but ending with '
binary_fasta
'. The fasta file with decoy protein sequences will have the same name is the input fasta file, with ending '-random.fasta
if the random shuffling is used and '-reverse.fasta
if the protein sequences are reversed (see options below). The text file is named 'README
'.Parameter file options:
--min-mass <float>
– The minimum neutral mass of the peptides to place in the index. Default = 200.--max-mass <float>
– The maximum neutral mass of the peptides to place in index. Default = 7200.--min-length <int>
– The minimum length of the peptides to place in the index. Default = 6.--max-length <int>
– The maximum length of the peptides to place in the index. Default = 50.--enzyme trypsin|trypsin/p|chymotrypsin|elastase|clostripain|cyanogen-bromide|idosobenzoate|proline-endopeptidase|staph-protease|asp-n|lys-c|lys-n|arg-c|glu-c|pepsin-a|elastase-trypsin-chymotrypsin|no-enzyme
– Enzyme to use for in silico digestion of protein sequences. Used in conjunction with thedigestion
andmissed-cleavages
options. Use 'no-enzyme' for non-specific digestion. Digestion rules are as follows: enzyme name [cuts after one of these residues]|{but not before one of these residues}. trypsin [RK]|{P}, trypsin/p [RK]|[], elastase [ALIV]|{P}, chymotrypsin [FWYL]|{P}, clostripain [R]|[], cyanogen-bromide [M]|[], iodosobenzoate [W]|[], proline-endopeptidase [P]|[], staph-protease [E]|[], elastase-trypsin-chymotrypsin [ALIVKRWFY]|{P}, asp-n []|[D], lys-c [K]|{P}, lys-n []|[K], arg-c [R]|{P}, glu-c [DE]|{P}, pepsin-a [FL]|{P}. Default = trypsin.--custom-enzyme <residues before cleavage>|<residues after cleavage>
– Specify rules for in silico digestion of protein sequences. Overrides theenzyme
option. Two lists of residues are given enclosed in square brackets or curly braces and separated by a |. The first list contains residues required/prohibited before the cleavage site and the second list is residues after the cleavage site. If the residues are required for digestion, they are in square brackets, '[' and ']'. If the residues prevent digestion, then they are enclosed in curly braces, '{' and '}'. Use X to indicate all residues. For example, trypsin cuts after R or K but not before P which is represented as[RK]|{P}
. AspN cuts after any residue but only before D which is represented as[X]|[D]
.--digestion full-digest|partial-digest
– Degree of digestion used to generate peptides . Either both ends (full-digest) or at least one end (partial-digest) of a peptide must conform to enzyme specificity rules. Used in conjunction with theenzyme
orcustom-enzyme
option whenenzyme
is not set to to 'no-enzyme'. Default full-digest.--missed-cleavages <int>
– Include in the index peptides containing up to <int> missed cleavage sites. Default = 0.--isotopic-mass average|mono
– Specify the type of isotopic masses to use when calculating the peptide mass. Default = average.--peptide-list T|F
– Create in the output directory a text file listing all of the peptides in the database, along with their neutral masses, one per line. Default = F.--decoys <string>
– Include a decoy version of every peptide by shuffling or reversing the target sequence. Possible values arenone, reverse, protein-shuffle, peptide-shuffle
. Use 'reverse' to reverse each protein sequence, 'protein-shuffle' to shuffle each protein sequence, or 'peptide-shuffle' to shuffle the sequence between enzyme cleavage sites, leaving the termini in place. Use 'none' for no decoys. Default=peptide-shuffle.--overwrite T|F
Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.--parameter-file <string>
A file containing command-line or additional parameters. See the parameter documentation page for details. Default = no parameter file.--verbosity <0-100>
– Specify the verbosity of the current processes. Each level prints the following messages including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.Increasing the file limit:
<A-Z> <float>
– Specify static modifications. This is a mass change applied to the given amino acid (in single-letter-code A thru Z) for every peptide in which it occurs. Use themod
option when searching to generate peptides both with and without the mass change. Default C=57.
The Crux indexing scheme relies on having a large number of files open simultaneously. On some operating systems, this will result in an error message like this:
WARNING: cannot open all file handlers neededOn a Unix system, you can find out the maximum number of files that can be open simultaneously using a command like this:
ulimit -aYou can then use
ulimit -n 1024to reset the limit (in this case, to 1024 files). Note that once you have set the limit on open files handles in a shell you cannot increase it again for the duration of that shell. However, if you start a new shell, you can once again increase the open file handle limit. New shells will remember the last limit set.
Known BugsIn cygwin, the
--overwrite T
option does not work. A temporary directory is created with the new index but fails to be re-named correctly. As a work-around, delete the existing index and rename the new temporary index with the correct name.