MS1, MS2, and SQT - Three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications

W. Hayes McDonald, David L. Tabb, Rovshan Sadygov, Michael J. MacCoss, John Venable, Johannes Graumann, Jeff R. Johnson, Daniel Cociorva, John R. Yates

Research output: Contribution to journalArticle

235 Citations (Scopus)

Abstract

As the speed with which proteomic labs generate data increases along with the scale of projects they are undertaking, the resulting data storage and data processing problems will continue to challenge computational resources. This is especially true for shotgun proteomic techniques that can generate tens of thousands of spectra per instrument each day. One design factor leading to many of these problems is caused by storing spectra and the database identifications for a given spectrum as individual files. While these problems can be addressed by storing all of the spectra and search results in large relational databases, the infrastructure to implement such a strategy can be beyond the means of academic labs. We report here a series of unified text file formats for storing spectral data (MS1 and MS2) and search results (SQT) that are compact, easily parsed by both machine and humans, and yet flexible enough to be coupled with new algorithms and data-mining strategies.

Original languageEnglish (US)
Pages (from-to)2162-2168
Number of pages7
JournalRapid Communications in Mass Spectrometry
Volume18
Issue number18
DOIs
StatePublished - 2004
Externally publishedYes

Fingerprint

Data mining
Data storage equipment
Proteomics

ASJC Scopus subject areas

  • Analytical Chemistry
  • Spectroscopy

Cite this

MS1, MS2, and SQT - Three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications. / McDonald, W. Hayes; Tabb, David L.; Sadygov, Rovshan; MacCoss, Michael J.; Venable, John; Graumann, Johannes; Johnson, Jeff R.; Cociorva, Daniel; Yates, John R.

In: Rapid Communications in Mass Spectrometry, Vol. 18, No. 18, 2004, p. 2162-2168.

Research output: Contribution to journalArticle

McDonald, W. Hayes ; Tabb, David L. ; Sadygov, Rovshan ; MacCoss, Michael J. ; Venable, John ; Graumann, Johannes ; Johnson, Jeff R. ; Cociorva, Daniel ; Yates, John R. / MS1, MS2, and SQT - Three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications. In: Rapid Communications in Mass Spectrometry. 2004 ; Vol. 18, No. 18. pp. 2162-2168.
@article{a33d3662a88a4849ad191ebb0699ca2b,
title = "MS1, MS2, and SQT - Three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications",
abstract = "As the speed with which proteomic labs generate data increases along with the scale of projects they are undertaking, the resulting data storage and data processing problems will continue to challenge computational resources. This is especially true for shotgun proteomic techniques that can generate tens of thousands of spectra per instrument each day. One design factor leading to many of these problems is caused by storing spectra and the database identifications for a given spectrum as individual files. While these problems can be addressed by storing all of the spectra and search results in large relational databases, the infrastructure to implement such a strategy can be beyond the means of academic labs. We report here a series of unified text file formats for storing spectral data (MS1 and MS2) and search results (SQT) that are compact, easily parsed by both machine and humans, and yet flexible enough to be coupled with new algorithms and data-mining strategies.",
author = "McDonald, {W. Hayes} and Tabb, {David L.} and Rovshan Sadygov and MacCoss, {Michael J.} and John Venable and Johannes Graumann and Johnson, {Jeff R.} and Daniel Cociorva and Yates, {John R.}",
year = "2004",
doi = "10.1002/rcm.1603",
language = "English (US)",
volume = "18",
pages = "2162--2168",
journal = "Rapid Communications in Mass Spectrometry",
issn = "0951-4198",
publisher = "John Wiley and Sons Ltd",
number = "18",

}

TY - JOUR

T1 - MS1, MS2, and SQT - Three unified, compact, and easily parsed file formats for the storage of shotgun proteomic spectra and identifications

AU - McDonald, W. Hayes

AU - Tabb, David L.

AU - Sadygov, Rovshan

AU - MacCoss, Michael J.

AU - Venable, John

AU - Graumann, Johannes

AU - Johnson, Jeff R.

AU - Cociorva, Daniel

AU - Yates, John R.

PY - 2004

Y1 - 2004

N2 - As the speed with which proteomic labs generate data increases along with the scale of projects they are undertaking, the resulting data storage and data processing problems will continue to challenge computational resources. This is especially true for shotgun proteomic techniques that can generate tens of thousands of spectra per instrument each day. One design factor leading to many of these problems is caused by storing spectra and the database identifications for a given spectrum as individual files. While these problems can be addressed by storing all of the spectra and search results in large relational databases, the infrastructure to implement such a strategy can be beyond the means of academic labs. We report here a series of unified text file formats for storing spectral data (MS1 and MS2) and search results (SQT) that are compact, easily parsed by both machine and humans, and yet flexible enough to be coupled with new algorithms and data-mining strategies.

AB - As the speed with which proteomic labs generate data increases along with the scale of projects they are undertaking, the resulting data storage and data processing problems will continue to challenge computational resources. This is especially true for shotgun proteomic techniques that can generate tens of thousands of spectra per instrument each day. One design factor leading to many of these problems is caused by storing spectra and the database identifications for a given spectrum as individual files. While these problems can be addressed by storing all of the spectra and search results in large relational databases, the infrastructure to implement such a strategy can be beyond the means of academic labs. We report here a series of unified text file formats for storing spectral data (MS1 and MS2) and search results (SQT) that are compact, easily parsed by both machine and humans, and yet flexible enough to be coupled with new algorithms and data-mining strategies.

UR - http://www.scopus.com/inward/record.url?scp=4544280727&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4544280727&partnerID=8YFLogxK

U2 - 10.1002/rcm.1603

DO - 10.1002/rcm.1603

M3 - Article

C2 - 15317041

AN - SCOPUS:4544280727

VL - 18

SP - 2162

EP - 2168

JO - Rapid Communications in Mass Spectrometry

JF - Rapid Communications in Mass Spectrometry

SN - 0951-4198

IS - 18

ER -