crux comet
Usage:
crux comet [options] <spectra> <protein input>
Description:
This command searches a protein database with a set of spectra, assigning peptide sequences to the observed spectra. This search engine was developed by Jimmy Eng at the University of Washington Proteomics Resource.
Although its history goes back two decades, the Comet search engine was first made publicly available in August 2012 on SourceForge. Comet is multithreaded and supports multiple input and output formats.
"Comet: an open source tandem mass spectrometry sequence database search tool." Eng JK, Jahan TA, Hoopmann MR. Proteomics. 2012 Nov 12. doi: 10.1002/pmic.201200439
Input:
- <spectra> – The name of the file from which to parse the spectra. File formats are supported by proteowizard, with the exception of the vendor's formats.
- <database_name> – The name of the file in fasta format from which to retrieve proteins and peptides. The database can contain amino acid sequences or nucleic acid sequences. If the sequences are nucleic acid sequences, then you must instruct Comet to translate these to amino acid sequences by setting
nucleotide_reading_frame
to a value between 1 and 9.Output:
The program writes files to the folder
crux-output
by default. The name of the output folder can be set by the user using the--output-dir
option. The following files will be created:
- comet.params.txt: a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the
--parameter-file
option for other crux programs.- comet.target.txt: a tab-delimited text file containing the target PSMs. See txt file format for a list of the fields.
- comet.log.txt: a log file containing a copy of all messages that were printed to standard error.
Options:
Because Comet was developed externally to Crux, the names of various command line parameters are somewhat different. For example, Comet uses underscores ("_") within parameter names, whereas other Crux commands use hyphens ("-").
Database
CPU threads
Masses
- peptide_mass_tolerance
- peptide_mass_units
- mass_type_parent
- mass_type_fragment
- precursor_tolerance_type
- isotope_error
Search enzyme
Fragment ions
- fragment_bin_tol
- fragment_bin_offset
- theoretical_fragment_ions
- use_A_ions
- use_B_ions
- use_C_ions
- use_X_ions
- use_Y_ions
- use_Z_ions
- use_NL_ions
- use_sparse_matrix
Output
--fileroot <string>
– Thefileroot
string will be added as a prefix to all output file names. Default = none.--output-dir <filename>
– The name of the directory where output files will be created. Default = crux-output.--overwrite <T|F>
– Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.- output_txtfile
- output_sqtfile
- output_pepxmlfile
- output_pinxmlfile
- output_outfiles
- print_expect_score
- num_output_lines
- show_fragment_ions
- sample_enzyme_number
mzXML/mzML parameters
Miscellaneous parameters
--parameter-file <filename>
– A file containing command-line or additional parameters. See the parameter documentation page for details. Default = no parameter file.--verbosity <0-100>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.- digest_mass_range
- num_results
- skip_researching
- max_fragment_charge
- max_precursor_charge
- nucleotide_reading_frame
- clip_nterm_methionine
- spectrum_batch_size
- output_suffix
Spectral processing
Variable modifications
- variable_mod1
- variable_mod2
- variable_mod3
- variable_mod4
- variable_mod5
- variable_mod6
- max_variable_mods_in_peptide
- variable_C_terminus
- variable_N_terminus
- variable_C_terminus_distance
- variable_N_terminus_distance
Static modifications
- add_Cterm_peptide
- add_Nterm_peptide
- add_Cterm_protein
- add_Nterm_protein
- add_G_glycine
- add_A_alanine
- add_S_serine
- add_P_proline
- add_V_valine
- add_T_threonine
- add_C_cysteine
- add_L_leucine
- add_I_isoleucine
- add_N_asparagine
- add_D_aspartic_acid
- add_Q_glutamine
- add_K_lysine
- add_E_glutamic_acid
- add_M_methionine
- add_O_ornithine
- add_H_histidine
- add_F_phenylalanine
- add_R_arginine
- add_Y_tyrosine
- add_W_tryptophan
- add_B_user_amino_acid
- add_J_user_amino_acid
- add_U_user_amino_acid
- add_X_user_amino_acid
- add_Z_user_amino_acid
Enzyme handling
Enzymatic digestion rules are specified using
search_enzyme_number
and the following lines at the end of the parameter file:#
# COMET_ENZYME_INFO _must_ be at the end of this parameters file
#
[COMET_ENZYME_INFO]
0. No_enzyme 0 - -
1. Trypsin 1 KR P
2. Trypsin/P 1 KR -
3. Lys_C 1 K P
4. Lys_N 0 K -
5. Arg_C 1 R P
6. Asp_N 0 D -
7. CNBr 1 M -
8. Glu_C 1 DE P
9. PepsinA 1 FL P
10. Chymotrypsin 1 FWYL P
The
search_enzyme_number
keys into this list of enzymes. This list can be further modified to support other enzymes.Differences between the Crux and standalone versions of Comet
Unlike the standalone version of Comet, the
comet
command in Crux
- reads input spectra in any format supported by Proteowizard,
- offers a variety of additional options (--fileroot, --output-dir, --overwrite, --verbosity), and
- prints all messages both to the screen and to a log file.