Running a Simple Search Using Tide and Percolator

Now that you have your environment set up and the two input files in your working directory, you can conduct the search. The search process compares each spectrum in demo.ms2 to peptides (subsequences of the proteins) in fasta files provided in a dirctory, yeast-index/. Peptides whose precursor mass is close to that of the observed spectrum are scored against that spectrum, and the top scores are reported in the output. To conduct the search, we first create a peptide index using tide-index and then execute the search using tide-search.

  1. $ crux tide-index small-yeast.fasta yeast-index

    While generating the peptide index, you will see output like this:

    INFO: Beginning tide-index.
    INFO: Writing results to output directory 'crux-output'.
    INFO: CPU: Genomes-MacBook-Pro.local
    INFO: Wed Dec 18 11:12:00 SGT 2013
    INFO: Running tide-index...
    INFO: Writing results to output directory 'yeast-index'.
    INFO: Reading small-yeast.fasta and computing unmodified peptides...
    INFO: Writing decoy fasta...
    INFO: Reading proteins
    INFO: Precomputing theoretical spectra...
    INFO: Elapsed time: 0.293 s
    INFO: Finished crux tide-index.
    INFO: Return Code:0
    

    This command produces the peptide index in yeast-index and also produces a directory crux-output containing the following files:

    • tide-index.decoy.fasta – a set of decoy proteins, derived from the proteins in the input set,
    • tide-search.params.txt – a record of all the parameters used in the search, and
    • tide-search.log.txt – a log file containing a copy of all the messages printed to the screen during the search.

    Now you can run this command:

  2. $ crux tide-search --compute-sp T demo.ms2 yeast-index
  3. While the search is running, you will see output like this:

    INFO: Beginning tide-search.
    WARNING: The output directory 'crux-output' already exists.
    Existing files will not be overwritten.
    INFO: CPU: Genomes-MacBook-Pro.local
    INFO: Wed Dec 18 11:14:28 SGT 2013
    INFO: Running tide-search...
    INFO: Reading index yeast-index
    INFO: Reading spectra file demo.ms2
    INFO: Converting demo.ms2 to spectrumrecords format
    INFO: Sorting spectra
    INFO: Running search
    INFO: Elapsed time: 0.543 s
    INFO: Finished crux tide-search.
    INFO: Return Code:0
    

    The crux-output directory now contains four new files containing the search results:

    • tide-search.target.txt – search results in tab-delimited format.
    • tide-search.decoy.txt – search results from a decoy database in tab-delimited format.
    • tide-search.params.txt – a record of all the parameters used in the search.
    • tide-search.log.txt – a log file containing a copy of all the messages printed to the screen during the search.

    The final step is to post-process the search results using Percolator. Each spectrum has been compared to many peptides and we would like to return only the best match for each spectrum. We also expect that some fraction of the spectra will not be identifiable as peptides (due to chemical noise, multiple peptides co-eluting, poor fragmentation, etc.). The analysis step filters out those spectra and ranks the matches by quality.

  4. $ crux percolator crux-output/search.target.txt

    While the analysis is running, you will see output like this

    INFO: Beginning percolator.
    WARNING: The output directory 'crux-output' already exists.
    Existing files will not be overwritten.
    INFO: CPU: Genomes-MacBook-Pro.local
    INFO: Wed Dec 18 11:19:20 SGT 2013
    INFO: Running make-pin with 'crux-output/tide-search.target.txt' and decoy file 'crux-output/tide-search.decoy.txt'.
    INFO: Finished make-pin.
    INFO: Percolator version 2.04, Build Date Dec 11 2013 16:46:25
    INFO: Copyright (c) 2006-9 University of Washington. All rights reserved.
    INFO: Written by Lukas Käll (lukall@u.washington.edu) in the
    INFO: Department of Genome Sciences at the University of Washington.
    INFO: Issued command:
    INFO: percolator -X crux-output/percolator.target.pout.xml -r crux-output/percolator.target.txt -B crux-output/percolator.decoy.txt -v 2 -P decoy_ --seed 1 -p 0.01000000 -n 0.00000000 --trainFDR 0.01000000 --testFDR 0.01000000 --maxiter 10 --train-ratio 0.60000000 -s crux-output/make-pin.pin.xml
    INFO: Started Wed Dec 18 11:19:21 2013
    INFO: Hyperparameters fdr=0.01, Cpos=0.01, Cneg=0, maxNiter=10
    INFO: WARNING: no valid local xml schema is available to validate the input.
    INFO: If further errors are encountered, please reinstall Percolator.
    INFO: WARNING: no valid local xml schema is available to validate the input.
    INFO: If further errors are encountered, please reinstall Percolator.
    INFO: enzyme=trypsin
    INFO: Features:
    INFO: lnrSp deltLCn deltCn Xcorr Sp IonFrac Mass PepLen Charge1 Charge2 Charge3 enzN enzC enzInt lnNumSP dM absdM 
    INFO: Train/test set contains 719 positives and 719 negatives, size ratio=1 and pi0=1
    INFO: selecting cneg by cross validation
    INFO: Estimating 55 over q=0.01 in initial direction
    INFO: Reading in data and feature calculation took 0.261052 cpu seconds or 0 seconds wall time
    INFO: ---Training with Cpos=0.01, Cneg selected by cross validation, fdr=0.01
    INFO: Iteration 1 :	After the iteration step, 63 target PSMs with q<0.01 were estimated by cross validation
    INFO: Iteration 2 :	After the iteration step, 63 target PSMs with q<0.01 were estimated by cross validation
    INFO: Iteration 3 :	After the iteration step, 63 target PSMs with q<0.01 were estimated by cross validation
    INFO: Iteration 4 :	After the iteration step, 63 target PSMs with q<0.01 were estimated by cross validation
    INFO: Iteration 5 :	After the iteration step, 63 target PSMs with q<0.01 were estimated by cross validation
    INFO: Iteration 6 :	After the iteration step, 63 target PSMs with q<0.01 were estimated by cross validation
    INFO: Iteration 7 :	After the iteration step, 63 target PSMs with q<0.01 were estimated by cross validation
    INFO: Iteration 8 :	After the iteration step, 63 target PSMs with q<0.01 were estimated by cross validation
    INFO: Iteration 9 :	After the iteration step, 63 target PSMs with q<0.01 were estimated by cross validation
    INFO: Iteration 10 :	After the iteration step, 63 target PSMs with q<0.01 were estimated by cross validation
    INFO: Obtained weights (only showing weights of first cross validation set)
    INFO: # first line contains normalized weights, second line the raw weights
    INFO: lnrSp	deltLCn	deltCn	Xcorr	Sp	IonFrac	Mass	PepLen	Charge1	Charge2	Charge3	enzN	enzC	enzInt	lnNumSP	dM	absdM	m0
    INFO: 0.0421	-0.0636	0.0226	0.1690	0.3100	0.0195	0.0120	0.0184	0.0017	-0.0057	0.0054	0.0000	0.0000	-0.0047	-0.0080	-0.0212	0.0178	-0.9205
    INFO: 0.0705	-0.2069	0.0994	0.2076	0.0011	0.1464	0.0000	0.0041	0.0046	-0.0120	0.0140	0.0000	0.0000	-0.0411	-0.0133	-0.0230	0.0278	-1.2319
    INFO: After all training done, 66 target PSMs with q<0.0100 were found when measuring on the test set
    INFO: Found 66 target PSMs scoring over 1.0000% FDR level on testset
    INFO: Merging results from 3 datasets
    INFO: Tossing out "redundant" PSMs keeping only the best scoring PSM for each unique peptide.
    INFO: Selecting pi_0=0.8935
    INFO: Calibrating statistics - calculating q values
    INFO: New pi_0 estimate on merged list gives 15 peptides over q=0.0100
    INFO: Calibrating statistics - calculating Posterior error probabilities (PEPs)
    INFO: Processing took 5.385 cpu seconds or 5 seconds wall time
    INFO: Elapsed time: 6.01 s
    INFO: Finished crux percolator.
    INFO: Return Code:0
    

    The crux-output directory will now contain eight new files:

    • percolator.target.psms.txt – a list of peptide-spectrum matches (PSMs), ranked by quality,
    • percolator.target.peptides.txt – a list of peptides, ranked by quality,
    • percolator.decoy.psms.txt – a ranked list of decoy PSMs,
    • percolator.decoy.peptides.txt – a ranked list of decoy peptides,
    • percolator.pout.xml – a single XML output file containing all of the Percolator results,
    • make-pin.pin.xml: an intermediate XML format file that is used by Percolator.
    • percolator.params.txt – parameter file, and
    • percolator.log.txt – log file.

    The beginning of the percolator.target.psms.txt file will look like this:

    scanchargespectrum precursor m/zspectrum neutral masspeptide masspercolator scorepercolator rankpercolator q-valuematches/spectrumsequencecleavage typeprotein idflanking aa
    262692.38231382.75001382.446711.55360871705TASEFDSAIAQDKtrypsin-full-digestYLR043CLK
    112745.27231488.53001489.73189.9082899205NFLETVELQVGLKtrypsin-full-digestYGL135WNR
    502651.29221300.56991301.41609.6147493010LDVDELGDVAQKtrypsin-full-digestYLR043CNK
    853497.62061489.84001489.73189.5444765403NFLETVELQVGLKtrypsin-full-digestYGL135WNR
    11831031.94073092.80003095.23669.1175385503ELESAAYDHAEPVQPEDAPQDIANDELKtrypsin-full-digestYGL009CDK
    1312745.85221489.68991489.73188.8824759603NFLETVELQVGLKtrypsin-full-digestYGL135WNR
    1462692.68231383.35001382.44678.7510319705TASEFDSAIAQDKtrypsin-full-digestYLR043CLK

    In this output, the PSMs are ranked by "percolator score," with higher scores indicating a higher quality match. The associated statistical confidence estimate is reported as a "percolator q-value," interpreted as the minimal false discovery rate threshold at which this match is deemed significant. In the list above, all of the matches have q-values of 0, meaning that they are highly significant. The meanings of the remaining columns are described here.


Next: Creating a peptide index

Crux home