crux psm-convert
Usage:
crux psm-convert [options] <input PSM file> <output format>
Description:
This command reads in a file containing peptide-spectrum matches (PSMs) in one of a variety of supported formats and outputs the same PSMs in a different format.
Input:
- <input PSM file> – The name of a PSM file in tab-delimited text, SQT, PIN, pepXML or mzIdentML format.
- <output format> – The desired format of the output file. Legal values are tsv, html, sqt, pin, pepxml, mzidentml, barista-xml.
Output:
The program writes the following files to the folder
crux-output
. The name of the output folder can be set by the user using the--output-dir
option.
- psm-convert.<format>: a file containing the input PSMs in the requested format.
- psm-convert.log.txt: a log file containing a copy of all messages that were printed to the screen during execution.
- psm-convert.params.txt: a file containing the name and value of all parameters/options for the current operation. Not all parameters in the file may have been used in the operation. The resulting file can be used with the
--parameter-file
option for other crux programs.Options:
--input-format auto|tsv|html|sqt|pin|pepxml|mzidentml|barista-xml
– The format of the input PSMs. The keyword "auto" will cause the program to determine the file format based upon the filename extension. Default = auto.--output-dir <filename>
– The name of the directory where output files will be created. Default = crux-output.--overwrite T|F
– Replace existing files if true (T) or fail when trying to overwrite a file if false (F). Default = F.--parameter-file <filename>
– A file containing command-line or additional parameters. See the parameter documentation page for details. Default = no parameter file.--verbosity <0-100>
– Specify the verbosity of the current processes. Each level prints the following messages, including all those at lower verbosity levels: 0-fatal errors, 10-non-fatal errors, 20-warnings, 30-information on the progress of execution, 40-more progress information, 50-debug info, 60-detailed debug info. Default = 30.
Implementation suggestions:
- Subclass CruxApplication to create a PSMConvertApplication class.
- Looking at MzIdentMLWriter, PMCPepXMLWriter, PepXMLWriter, SQTWriter, and MatchFileWriter, etc., derive an abstract class "PSMWriter" with virtual methods for writing out a MatchCollection or ProteinMatchCollection. Make all writers a subclass of this abstract class. You will need to standardize the abstract method calls over all of the writers e.g. virtual writePSM(Match*), writeCollection(MatchCollection*), etc.
- Similiar to step #2, look at the MzIdentMLReader, PepXMLReader, SQTReader, and MatchFileReader class and derive an abstract class "PSMReader" that returns a MatchCollection or ProteinMatchCollection after parsing the inputed data file.
- Implement PSMConvertApplication, that parses the (<input PSM file>) using the correct file format parser (either determined by file extension or
--input-format
parameter) and then write out the psms in the (<output format>)Notes:
- OutputFiles.h,cpp gives many examples of how to write out PSMs from the internal data objects.
- MatchCollectionParser.h,cpp gives an example of parsing a psm file to create a MatchCollection object.
- Some file formats (e.g. mzid) will need the protein database in order to fully parse/write the file formats.
- There is a PMCMatchCollection object that handles the associations between proteins, peptides, and psms. It might be better to use this object and phase out MatchCollection in the future. This might need some discussion.