barista.xml

The barista.xml is an XML files that used to record four main parts:

  • < proteins > ... < /proteins >
  • < subset_proteins > ... < /subset_proteins >
  • < peptides > ... < /peptides >
  • < psms > ... < /psms >
    1. Proteins:contains the ranked list of groups of indistinguishable target proteins. Each protein entry includes the following fields:
      1. protein group: a number that identifies a group.
      2. q-value: The minimal protein-level false discovery rate at which this protein is deemed significant. This q-value is computed based on the ranking of the proteins induced by the Barista score.
      3. score: The score assigned to the proteins by Barista. Higher values correspond to more confident identifications.
      4. protein_ids: proteins in the protein group.
      5. alternative_peptide_id: peptides are considered indistinguishable if they have identical amino acid sequences or they differ only by I/L or T/S in the same position in the peptide. If the peptides shared by the group are not identical, they are listed immediately after the proteins they belong to.
      6. peptide_ids: belong to each of the proteins in the group.

      For example, suppose that
      protein_a has peptide KLEAEVEALKK       // 'L' in second position
      and peptide VLGAK
      protein_b has peptide KIEAEVEALKK       // 'I' in second position
      and peptide VLGAK

      Then the xml entry could look like this
      < proteins >
      < q_value > 0 < /q_value >
      < score > 8.9 > < /score >
      < protein_group group_id="1" >
      < protein_ids >
      < protein_id > protein_a < /protein_id >
      < alternative_peptide_id > KLEAEVEALKK < /alternative_peptide_id >
      < protein_id > protein_b < /protein_id >
      < alternative_peptide_id > KIEAEVEALKK < /alternative_peptide_id >
      < /protein_ids >
      < peptide_ids >
      < peptide_id > VLGAK < /peptide_id >
      < /peptide_ids >
      < /protein_group >
      < /proteins >
    2. Subset proteins contains groups of indistinguishable proteins, which constitute a subset of some group in the proteins section in terms of the peptides identified in these proteins. Each entry includes

      1. group id and parent group id: the identifier of the group and the identifier of the protein group which has the superset of the peptide set belonging the current group
      2. protein_ids : proteins that belong to the group
      3. peptide_ids: peptides that belong to the proteins in the group.
    3. Peptides contains a ranked list of target peptides. Each peptide entry includes:

      1. peptide: Peptide amino acid sequence.
      2. q-value: The minimal peptide-level false discovery rate at which this peptide is deemed significant. This q-value is computed based on the ranking of the peptides induced by the Barista score.
      3. score: The score assigned to the peptide by Barista. Higher values correspond to more confident identifications.
      4. main_psm_id: The PSM identifier based on which the peptide received its score. A peptide score is the maximum over all the PSMs that contain this peptide.
      5. psm_ids: The identifiers of all the PMS that contain this peptide .
      6. protein_ids: All the proteins that contain this peptide and were infered based on some PSMs from the database search.
    4. PSMs contains ranked list of target peptide-spectrum matches. The following columns are included:

      1. psm_id: PSM identifier.
      2. q-value: The minimal PSM-level false discovery rate at which this PSM is deemed significant. This q-value is computed based on the ranking of the PSMs induced by the Barista score.
      3. score: The score assigned to the PSM by Barista. Higher values correspond to more confident identifications.
      4. scan: the scan number
      5. charge: the inferred charge state
      6. precursor_mass: precrusor mass as recorded during the MS1 scan
      7. peptide: the peptide sequence
      8. filename: name of the file in which the PSM appears