crux hardklor
Usage:Description:crux hardklor [options] <spectra>
Hardklör analyzes high-resolution mass spectra, identifying protein or peptide isotope distributions and determining the corresponding monoisotopic masses and charge states. Hardklör is specifically designed to handle overlapping isotope distributions in a single spectrum. A detailed description of the Hardklör algorithm is given in
Hoopmann MR, Finney GL and MacCoss MJ. "High speed data reduction, feature selection, and MS/MS spectrum quality assessment of shotgun proteomics datasets using high resolution mass spectrometry." Analytical Chemistry. 79:5630-5632 (2007).Input:
Output:
- <spectra> – The name of a file from which to parse high-resolution spectra. The file may be in MS1 (.ms1), binary MS1 (.bms1), compressed MS1 (.cms1), or mzXML (.mzXML) format.
The program writes files to the folder
crux-output
by default. The name of the output folder can be set by the user using the--output-dir
option. The following files will be created:Options:
- hardklor.params.txt: a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the
--parameter-file
option for other crux programs.- hardklor.mono.txt: a tab-delimited text file containing one line for each isotope distribution. The columns appear in the following order:
- scan: The scan number assigned to this spectrum in the input file.
- retention time: The time (in seconds) at which the spectrum was collected.
- mass: The uncharged monoisotopic mass of the protein or peptide.
- charge: The inferred charge state of the protein or peptide.
- intensity: The intensity of the base isotope peak of the model used to predict the protein or peptide.
- m/z: The m/z of the base peak.
- s/n: The signal-to-noise threshold, i.e., the relative abundance a peak must exceed in the spectrum window to be considered in the scoring algorithm. Note that this is a local noise threshold for the area of the spectrum that the peptide was identified in
- modifications: Deviations to the averagine model. Only modifications specified by the user are considered. If no modifications are found in a particular PPID, then the column is marked with an underscore.
- dotp: The dot product score applies to all predictions in a given spectrum window. Thus, if two protein or peptide predictions share the same spectrum window, then they have a single dot product score that is the score of their combined peaks.
- hardklor.log.txt: a log file containing a copy of all messages that were printed to stderr.
Notes:
--hardklor-algorithm <name>
– Choose the algorithm for analyzing combinations of multiple peptide or protein isotope distributions. There are five algorithms to choose from:The default setting is fast-fewest-peptides.
basic
– Computes all combinatorial possibilities and returns the combination with the highest score.fewest-peptides
– Computes increasing depths of combinations until the score threshold is exceeded. The smallest combination exceeding the threshold is returned, preventing over-fitting of the data.fast-fewest-peptides
– Same as thefewest-peptides
algorithm, but trades memory usage for speed. Use this method if there is sufficient memory on the system.fewest-peptides-choice
– Same as thefewest-peptides
algorithm, but adds a heuristic to evaluate if further combinatorial analysis would produce a better score. This method can dramatically improve speed, but may not be as accurate.fast-fewest-peptides-choice
– Same as thefewest-peptides-choice
algorithm, but trades memory usage for speed. Use this method if there is sufficient memory on the system.--cdm B|F|P|Q|S
– Choose the charge state determination method. There are five methods to choose from:The default setting is Q.
B
– Basic method, assume all charge states are possible.F
– Fast Fourier transform.P
– Patterson algorithm.Q
– QuickCharge method, uses inverse peak distances.S
– Senko method, or combined Fast Fourier Transform and Patterson algorithm.--min-charge <int>
– Set the minimum charge state to look for when analyzing a spectrum. The default value is 1.--max-charge <int>
– Set the maximum charge state to look for when analyzing a spectrum. The default value is 5.--corr <float>
– Set the correlation threshold to accept a predicted isotope distribution. Valid values are any decimal value between 0.0 and 1.0, inclusive. The default value is 0.85.--depth <int>
– Set the depth of combinatorial analysis. This is the maximum number of protein or peptide distributions that can be combined to estimate the observed data at any given spectrum segment. The default value is 3.--distribution-area <T|F>
– Report peptide intensities as the distribution area. Default false.--averagine-mod <string>
– Include alternative averagine models in the analysis that incorporate additional atoms or isotopic enrichments. Modifications are represented as text strings. Inclusion of additional atoms in the model is done using by entering an atomic formula such as "PO2" or "Cl". Inclusion of isotopic enrichment to the model is done by specifying the percent enrichment (as a decimal) followed by the atom being enriched and an index of the isotope. For example, 0.75H1 specifies 75% enrichment of the first heavy isotope of hydrogen. In other words, 75% deuterium enrichment. Two or more modifications can be combined into the same model and separated by colons: "B2:0.5B1". Multiple averagine models are supported in a single analysis by separating the models with a semicolon: "B2:0.5B1;C2:0.7C1".--mzxml-filter ms1|ms2|ms3|none
– Set a filter for mzXML files. If you want to analyze only the MS2 scans in your mzXML file, specify--mzxml-filter MS2
. Default = none.--no-base T|F
– Specify "no base" averagine. Only modified averagine models will be used in the analysis. Default = F.--max-p <int>
– Set the maximum number of peptides or proteins that are estimated from the peaks found in a spectrum segment. The default value is 10.--resolution <double>
– Set the resolution of the observed spectra at m/z 400. Resolution is a unitless quantity defined as the mass of the peak divided by the associated width at half maximum height (FWHM). Used in conjunction with--instrument
. The default is 100000.--instrument fticr|orbi|tof|qit
– Type of instrument on which the data was collected. Used in conjuction with--resolution
. The default is fticr.--centroided T|F
– Are the spectra centroided? Default = F.--scan-number <number range>
– A single scan number or a range of numbers to be analyzed. Range should be specified as 'first-last' which will include scans 'first' and 'last'. Default = search all spectra.--sensitivity 0|1|2|3
– Set the sensitivity level. There are four levels, 0 (low), 1 (moderate), 2 (high), and 3 (max). Increasing the sensitivity will increase computation time, but will also yield more isotope distributions. The default value is 2.--signal-to-noise <float>
– Set the signal-to-noise threshold. Any integer or decimal value greater than or equal to 0.0 is valid. The default value is 1.0.--sn-window <float>
– Set the signal-to-noise window length (in m/z). Because noise may be non-uniform across a spectra, this value adjusts the segment size considered when calculating a signal-over-noise ratio. The default value is 250.0.--static-sn <T|F>
– If true, Hardklor will calculate the local noise levels across the spectrum using--sn-window
, then select a floor of this set of noise levels to apply to the whole spectrum. Default is true.--mz-window <double>-<double>
– Restrict analysis to only a small window in each segment (in m/z). The user must specify the starting and ending m/z values between which the analysis will be performed. By default the whole spectrum is analyzed.--max-width <float>
– Set the maximum width of any set of peaks in a spectrum when computing the results (in m/z). Thus, if the value was 5.0, then sets of peaks greater than 5 m/z are divided into smaller sets prior to analysis. The default value is 4.0.Output
--fileroot <string>
– Thefileroot
string will be added as a prefix to all output file names. Default = none.--output-dir <filename>
– The name of the directory where output files will be created. Default = crux-output.--overwrite <T|F>
– Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.--parameter-file <filename>
– A file containing command-line or additional parameters. See the parameter documentation page for details. Default = no parameter file.--verbosity <0-100>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.
- For users familiar with the standalone version of Hardklör, the parameter mappings are here.