SQT file format
The SQT file format is used to record the matches between MS/MS spectra and a sequence database. A full description of the SQT file format may be found in:
McDonald, W.H. et al. "MS1, MS2, and SQT-three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications." Rapid Communications in Mass Spectrometry. 18, 2162-2168 (2004).An SQT file consists of a header followed by one or more spectrum matches. The header and match data are broken into records, one record per line. Fields within a record are separated by white space.
A sample SQT file may be found here.
Header details
Each line in the header must begin with an
H
. This is followed by a field label, and then a field value, all separated by white space. The field label must be one of the labels listed below, while the field value can be an aribrary string. A typical header is shown below.H SQTGenerator Crux H SQTGeneratorVersion 1.0 H Comment Crux was written by... H Comment ref... H StartTime Tue Dec 2 15:22:50 2008 H EndTime H Database test_crux_index/test-binary-fasta H DBSeqLength ? H DBLocusCount 4 H PrecursorMasses average H FragmentMasses mono H Alg-PreMasTol 3.0 H Alg-FragMassTol 0.50 H Alg-XCorrMode 0 H Comment preliminary algorithm sp H Comment final algorithm xcorr H StaticMod C=160.139 H Alg-DisplayTop 5 H EnzymeSpec trypticThe following field labels must appear in the header:
Required header field labels Field Label Description SQTGenerator The name of the software used to create the SQT file SQTGeneratorVersion The version number of the SQTGenerator software Database Path to sequence database used to generate the SQT file FragmentMasses Were average or mono-istopic residue masses used to predict the fragment ion mass PrecursorMasses Were average or mono-istopic residue masses used to predict the precursor ion mass StartTime Time when SQT file was created StaticMod Non-standard amino-acid masses used in identification (repeat this record if there are multiple non-standard masses) DynamicMod List of dynamic modifications used in identification The following field labels are optional, and may appear in the header:
Optional header field labels Field Label Description Comment Remarks. Multiple comment lines are allowed DBSeqLength Number of aminio acids in sequence database DBLocusCount Number of proteins in sequence database DBMD5Sum MD5 checksum of sequence database SortedBy Name of field use to sort spectra Alg- Field names begining with Alg- are algorithm specific * Other field names are allowed, but may not contain white-space Match details
Each spectrum match begins with a spectrum (S) record, each S record is followed by one or more match (M) records, and each M record is followed by one or more Locus (L) records. The S record contains the scan number and other details for the spectum, the M records contains the highest scoring peptide matches for the parent spectrum, and the L records give the names of proteins containing the parent match peptide.
S 45894 45894 2 1 maccoss007 2038.59 9199.5 147.1 153628 M 1 27 2040.244 0.0000 1.5881 245.6 11 34 V.YKCAADKQDATVVELTNL.T U L YCR102C M 2 68 2038.265 0.0116 1.5698 208.4 11 36 S.TQSGIVAEQALLHSLNENL.S U L YGR080W M 3 34 2039.247 0.1582 1.3369 239.3 11 36 I.NEKTSPALVIPTPDAENEI.S U L YLR035C M 4 322 2040.365 0.1699 1.3183 160.0 9 36 I.LKESKSVQPGKAIPDIIES.P U L YJL126W M 5 74 2039.453 0.2288 1.2248 203.6 10 32 D.MISVDLKTPLVIFKCHH.G U L YAL002W M 65 1 2041.246 0.4179 0.9245 370.2 13 32 S.CCGLSLPGLHDLLRHYE.E U L YLR403W S 45904 45904 3 1 maccoss007 2834.54 10103.7 246.4 152668 M 1 237 2833.059 0.0000 1.9175 273.1 20 108 N.NSGSDTVDPLAGLNNLRNSIKSAGNGME.N U L YDR505C M 2 223 2834.278 0.1390 1.6510 274.8 18 96 G.HLSRISNIDDILISMRMDAFDSLIG.Y U L YLR247C M 3 52 2835.100 0.1503 1.6292 324.1 20 96 S.KSTTEPIQLNNKHDLHLGQELTEST.V U L YDR098C-B L YDR365W-B L YER138C L YGR027W-B
Record types for scans Symbol Description Generic form Required S Spectrum S [low scan] [high scan] [charge] [process time]
[server] [experimental neutral mass] [total ion intensity] [lowest Sp] [# of seq. matched]yes M Match M [rank by Xcorr] [rank by Sp] [calculated mass]
[DeltaCN] [Xcorr] [Sp] [matched ions] [expected ions] [sequence matched]
[validation status U = unknown, Y = yes, N = no, M = Maybe]yes L Locus L [locus name] [description if available]
yes Differences between Crux's SQT format and the published version
- Relative to the original specification of SQT, Crux expects an additional field [total ion intensity] in the "S" lines.
- In the original specification, the FragmentMasses and PrecursorMasses header lines used the values "AVG" or "MONO", but Crux uses "average" and "mono".