Code developments to improve the efficiency of automated MS/MS spectra interpretation

Rovshan Sadygov, Jimmy Eng, Eberhard Durr, Anita Saraf, Hayes McDonald, Michael J. MacCoss, John R. Yates

Research output: Contribution to journalArticle

174 Citations (Scopus)

Abstract

We report the results of our work to facilitate protein identification using tandem mass spectra and protein sequence databases. We describe a parallel version of SEQUEST (SEQUEST-PVM) that is tolerant toward arithmetic exceptions. The changes we report effectively separate search processes on slave nodes from each other. Therefore, if one of the slave nodes drops out of the cluster due to an error, the rest of the cluster will carry the search process to the end. SEQUEST has been widely used for protein identifications. The modifications made to the code improve its stability and effectiveness in a high-throughput production environment. We evaluate the overhead associated with the parallelization of SEQUEST. A prior version of software to preprocess LC/MS/MS data attempted to differentiate the charge states of ions. Singly charged ions can be accurately identified, but the software was unable to reliably differentiate tandem mass spectra of +2 and +3 charge states. We have designed and implemented a computational approach to narrow charge states of precursor ions from nominal resolution ion-trap tandem mass spectra. The preprocessing code, 2to3, determines the charge state of the precursor ion using its mass-to-charge ratio (m/z) and fragment ions contained in the tandem mass spectrum. For each possible charge state the program calculates the expected fragment ions that account for precursor ion m/z vlues. If any one of the numbers is less than an empirically determined threshold value then the spectrum corresponding to that charge state is removed. If both numbers are higher than the threshold value then +2 and +3 copies of the spectrum are kept. We present the comparison of results from protein identification experiments with and without using 2to3. It is shown that by determining the charge state and eliminating poor quality spectra 2to3 decreases the number of spectral files to be searched without affecting the search results. The decrease reduces computer requirements and researcher efforts for analysis of the results.

Original languageEnglish (US)
Pages (from-to)211-215
Number of pages5
JournalJournal of Proteome Research
Volume1
Issue number3
DOIs
StatePublished - May 2002
Externally publishedYes

Fingerprint

Ions
Proteins
Software
Protein Databases
Research Personnel
Throughput
Experiments

Keywords

  • Charge determination
  • Database search
  • Mass spectrometry
  • Protein identification

ASJC Scopus subject areas

  • Genetics
  • Biotechnology
  • Biochemistry

Cite this

Code developments to improve the efficiency of automated MS/MS spectra interpretation. / Sadygov, Rovshan; Eng, Jimmy; Durr, Eberhard; Saraf, Anita; McDonald, Hayes; MacCoss, Michael J.; Yates, John R.

In: Journal of Proteome Research, Vol. 1, No. 3, 05.2002, p. 211-215.

Research output: Contribution to journalArticle

Sadygov, R, Eng, J, Durr, E, Saraf, A, McDonald, H, MacCoss, MJ & Yates, JR 2002, 'Code developments to improve the efficiency of automated MS/MS spectra interpretation', Journal of Proteome Research, vol. 1, no. 3, pp. 211-215. https://doi.org/10.1021/pr015514r
Sadygov, Rovshan ; Eng, Jimmy ; Durr, Eberhard ; Saraf, Anita ; McDonald, Hayes ; MacCoss, Michael J. ; Yates, John R. / Code developments to improve the efficiency of automated MS/MS spectra interpretation. In: Journal of Proteome Research. 2002 ; Vol. 1, No. 3. pp. 211-215.
@article{cf6687e97dbe48daa7619a60ccc21c71,
title = "Code developments to improve the efficiency of automated MS/MS spectra interpretation",
abstract = "We report the results of our work to facilitate protein identification using tandem mass spectra and protein sequence databases. We describe a parallel version of SEQUEST (SEQUEST-PVM) that is tolerant toward arithmetic exceptions. The changes we report effectively separate search processes on slave nodes from each other. Therefore, if one of the slave nodes drops out of the cluster due to an error, the rest of the cluster will carry the search process to the end. SEQUEST has been widely used for protein identifications. The modifications made to the code improve its stability and effectiveness in a high-throughput production environment. We evaluate the overhead associated with the parallelization of SEQUEST. A prior version of software to preprocess LC/MS/MS data attempted to differentiate the charge states of ions. Singly charged ions can be accurately identified, but the software was unable to reliably differentiate tandem mass spectra of +2 and +3 charge states. We have designed and implemented a computational approach to narrow charge states of precursor ions from nominal resolution ion-trap tandem mass spectra. The preprocessing code, 2to3, determines the charge state of the precursor ion using its mass-to-charge ratio (m/z) and fragment ions contained in the tandem mass spectrum. For each possible charge state the program calculates the expected fragment ions that account for precursor ion m/z vlues. If any one of the numbers is less than an empirically determined threshold value then the spectrum corresponding to that charge state is removed. If both numbers are higher than the threshold value then +2 and +3 copies of the spectrum are kept. We present the comparison of results from protein identification experiments with and without using 2to3. It is shown that by determining the charge state and eliminating poor quality spectra 2to3 decreases the number of spectral files to be searched without affecting the search results. The decrease reduces computer requirements and researcher efforts for analysis of the results.",
keywords = "Charge determination, Database search, Mass spectrometry, Protein identification",
author = "Rovshan Sadygov and Jimmy Eng and Eberhard Durr and Anita Saraf and Hayes McDonald and MacCoss, {Michael J.} and Yates, {John R.}",
year = "2002",
month = "5",
doi = "10.1021/pr015514r",
language = "English (US)",
volume = "1",
pages = "211--215",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "3",

}

TY - JOUR

T1 - Code developments to improve the efficiency of automated MS/MS spectra interpretation

AU - Sadygov, Rovshan

AU - Eng, Jimmy

AU - Durr, Eberhard

AU - Saraf, Anita

AU - McDonald, Hayes

AU - MacCoss, Michael J.

AU - Yates, John R.

PY - 2002/5

Y1 - 2002/5

N2 - We report the results of our work to facilitate protein identification using tandem mass spectra and protein sequence databases. We describe a parallel version of SEQUEST (SEQUEST-PVM) that is tolerant toward arithmetic exceptions. The changes we report effectively separate search processes on slave nodes from each other. Therefore, if one of the slave nodes drops out of the cluster due to an error, the rest of the cluster will carry the search process to the end. SEQUEST has been widely used for protein identifications. The modifications made to the code improve its stability and effectiveness in a high-throughput production environment. We evaluate the overhead associated with the parallelization of SEQUEST. A prior version of software to preprocess LC/MS/MS data attempted to differentiate the charge states of ions. Singly charged ions can be accurately identified, but the software was unable to reliably differentiate tandem mass spectra of +2 and +3 charge states. We have designed and implemented a computational approach to narrow charge states of precursor ions from nominal resolution ion-trap tandem mass spectra. The preprocessing code, 2to3, determines the charge state of the precursor ion using its mass-to-charge ratio (m/z) and fragment ions contained in the tandem mass spectrum. For each possible charge state the program calculates the expected fragment ions that account for precursor ion m/z vlues. If any one of the numbers is less than an empirically determined threshold value then the spectrum corresponding to that charge state is removed. If both numbers are higher than the threshold value then +2 and +3 copies of the spectrum are kept. We present the comparison of results from protein identification experiments with and without using 2to3. It is shown that by determining the charge state and eliminating poor quality spectra 2to3 decreases the number of spectral files to be searched without affecting the search results. The decrease reduces computer requirements and researcher efforts for analysis of the results.

AB - We report the results of our work to facilitate protein identification using tandem mass spectra and protein sequence databases. We describe a parallel version of SEQUEST (SEQUEST-PVM) that is tolerant toward arithmetic exceptions. The changes we report effectively separate search processes on slave nodes from each other. Therefore, if one of the slave nodes drops out of the cluster due to an error, the rest of the cluster will carry the search process to the end. SEQUEST has been widely used for protein identifications. The modifications made to the code improve its stability and effectiveness in a high-throughput production environment. We evaluate the overhead associated with the parallelization of SEQUEST. A prior version of software to preprocess LC/MS/MS data attempted to differentiate the charge states of ions. Singly charged ions can be accurately identified, but the software was unable to reliably differentiate tandem mass spectra of +2 and +3 charge states. We have designed and implemented a computational approach to narrow charge states of precursor ions from nominal resolution ion-trap tandem mass spectra. The preprocessing code, 2to3, determines the charge state of the precursor ion using its mass-to-charge ratio (m/z) and fragment ions contained in the tandem mass spectrum. For each possible charge state the program calculates the expected fragment ions that account for precursor ion m/z vlues. If any one of the numbers is less than an empirically determined threshold value then the spectrum corresponding to that charge state is removed. If both numbers are higher than the threshold value then +2 and +3 copies of the spectrum are kept. We present the comparison of results from protein identification experiments with and without using 2to3. It is shown that by determining the charge state and eliminating poor quality spectra 2to3 decreases the number of spectral files to be searched without affecting the search results. The decrease reduces computer requirements and researcher efforts for analysis of the results.

KW - Charge determination

KW - Database search

KW - Mass spectrometry

KW - Protein identification

UR - http://www.scopus.com/inward/record.url?scp=0036589948&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036589948&partnerID=8YFLogxK

U2 - 10.1021/pr015514r

DO - 10.1021/pr015514r

M3 - Article

VL - 1

SP - 211

EP - 215

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 3

ER -