crux generate-peptides
Usage:
crux generate-peptides [options] <protein input file>
Description:
This command takes as input a protein FASTA file and outputs the corresponding list of peptides, as well as a matched list of decoy peptides and decoy proteins. Decoys are generated either by reversing or shuffling the non-terminal amino acids of each peptide. The program will shuffle each peptide multiple times to attempt to ensure that there is no overlap between the target and decoy peptides. For homopolymers, this is not possible. In this case, the occurrence of these target/decoy overlaps is recorded in the log file.
The program considers only the standard set of 20 amino acids. Peptides containing non-amino acid alphanumeric characters (BJOUXZ) are skipped. Non-alphanumeric characters are ignored completely.
Input:
- <protein input file> – A file in FASTA format containing proteins.
Output:
The program writes files to the folder
crux-output
by default. The name of the output folder can be set by the user using the--output-dir
option. The following files will be created:- peptides.target.txt: a text file containing the target peptides, one per line.
- peptides.decoy.txt: a text file containing the decoy peptides, one per line. There is a one-to-one correspondence between targets and decoys.
- proteins.decoy.txt: a FASTA format file containing decoy proteins, in which all of the peptides have been replaced with their shuffled or reversed counterparts. Note that this file will only be created if the enzyme specificity is "full-digest" and no missed cleavages are allowed.
- generate-decoys.params.txt: a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the
--parameter-file
option for other crux programs.- tide-search.log.txt: a log file containing a copy of all messages that were printed to the screen during execution.
Options:
Peptide properties
--max-length <integer>
– Specify the maximum length of the peptides included in the index. Default=50.--min-length <integer>
– Specify the minimum length of the peptides included in the index. Default=6.--max-mass <integer>
– Specify the maximum monoisotopic mass (in Da) of the peptides included in the index. Default=7200.--min-mass <integer>
– Specify the minimum monoisotopic mass (in Da) of the peptides included in the index. Default=200.--monoisotopic-precursor <T|F>
– When computing the mass of a peptide, use monoisotopic masses rather than average mass values. Default = F.Decoy database generation
--decoys none|shuffle|reverse
– Include a decoy version of every peptide by shuffling or reversing the target sequence. Each peptide is either or reversed or shuffled, leaving the N-terminal and C-terminal amino acids in place. Note that peptides that appear multiple times in the target database are only shuffled once. Inreverse
mode, palindromic peptides are shuffled. Also, if a shuffled peptide produces an overlap with the target or decoy database, then the peptide is re-shuffled up to 5 times. Note that, despite this repeated shuffling, homopolymers will appear in both the target and decoy database. Default=shuffle.Enyzmatic digestion
--enzyme <string>
– Specify the enzyme used to digest the proteins in silico. Available enzymes include trypsin, trypsin/p, chymotrypsin, elastase, clostripain, cyanogen-bromide, idosobenzoate, proline-endopeptidase, staph-protease, asp-n, lys-c, lys-n, arg-c, glu-c, pepsin-a, elastase-trypsin-chymotrypsin. Specifying--enzyme no-enzyme
yields a non-enzymatic digest. Warning: the resulting index may be quite large. Default=trypsin
.--digestion <full-digest|partial-digest>
– Specify whether every peptide in the database must have two enzymatic termini (full-digest
) or if peptides with only one enzymatic terminus are also included (partial-digest
). Default=full-digest
--missed-cleavages <integer>
– Maximum number of missed cleavages per peptide to allow in enzymatic digestion. If this option is not specified, then missed cleavages are not allowed.Input and output
--output-dir <filename>
– The name of the directory where output files will be created. Default = crux-output.--overwrite <T|F>
– Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.--parameter-file <filename>
– A file containing command-line or additional parameters. See the parameter documentation page for details. Default = no parameter file.--verbosity <0-100>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.