Central limit theorem as an approximation for intensity-based scoring function

Rovshan Sadygov, James Wohlschlegel, Sung Kyu Park, Tao Xu, John R. Yates

Research output: Contribution to journalArticle

26 Citations (Scopus)

Abstract

In this paper, we present an intensity-based probability function to identify peptides from tandem mass spectra and amino acid sequence databases. The function is an approximation to the central limiting theorem, and it explicitly depends on the cumulative product ion intensities, number of product ions of a peptide, and expectation value of the cumulative intensity. We compare the results of database searches using the new scoring function and scoring functions from earlier algorithms, which implement hypergeometric probability, Poisson's model, and cross-correlation scores. For a standard protein mixture (tandem mass spectra generated from the mixture of five known proteins), we generate receiver operating curves with all scoring schemes. The receiver operating curves show that the shared peaks count-based probability methods (like Poisson and hypergeometric models) are the most specific for matching high-quality tandem mass spectra. The intensity-based (central limit model) and intensity-modeled (cross-correlation) methods are more sensitive when matching low-quality tandem mass spectra, where the number of shared peaks is insufficient to correctly identify a peptide. Cross-correlation methods show a small advantage over the intensity-based probability method.

Original languageEnglish (US)
Pages (from-to)89-95
Number of pages7
JournalAnalytical Chemistry
Volume78
Issue number1
DOIs
StatePublished - Jan 1 2006
Externally publishedYes

Fingerprint

Correlation methods
Peptides
Ions
Proteins
Amino Acids

ASJC Scopus subject areas

  • Analytical Chemistry

Cite this

Central limit theorem as an approximation for intensity-based scoring function. / Sadygov, Rovshan; Wohlschlegel, James; Park, Sung Kyu; Xu, Tao; Yates, John R.

In: Analytical Chemistry, Vol. 78, No. 1, 01.01.2006, p. 89-95.

Research output: Contribution to journalArticle

Sadygov, Rovshan ; Wohlschlegel, James ; Park, Sung Kyu ; Xu, Tao ; Yates, John R. / Central limit theorem as an approximation for intensity-based scoring function. In: Analytical Chemistry. 2006 ; Vol. 78, No. 1. pp. 89-95.
@article{19583496d3694c638bf1c7c8f276822b,
title = "Central limit theorem as an approximation for intensity-based scoring function",
abstract = "In this paper, we present an intensity-based probability function to identify peptides from tandem mass spectra and amino acid sequence databases. The function is an approximation to the central limiting theorem, and it explicitly depends on the cumulative product ion intensities, number of product ions of a peptide, and expectation value of the cumulative intensity. We compare the results of database searches using the new scoring function and scoring functions from earlier algorithms, which implement hypergeometric probability, Poisson's model, and cross-correlation scores. For a standard protein mixture (tandem mass spectra generated from the mixture of five known proteins), we generate receiver operating curves with all scoring schemes. The receiver operating curves show that the shared peaks count-based probability methods (like Poisson and hypergeometric models) are the most specific for matching high-quality tandem mass spectra. The intensity-based (central limit model) and intensity-modeled (cross-correlation) methods are more sensitive when matching low-quality tandem mass spectra, where the number of shared peaks is insufficient to correctly identify a peptide. Cross-correlation methods show a small advantage over the intensity-based probability method.",
author = "Rovshan Sadygov and James Wohlschlegel and Park, {Sung Kyu} and Tao Xu and Yates, {John R.}",
year = "2006",
month = "1",
day = "1",
doi = "10.1021/ac051206r",
language = "English (US)",
volume = "78",
pages = "89--95",
journal = "Analytical Chemistry",
issn = "0003-2700",
publisher = "American Chemical Society",
number = "1",

}

TY - JOUR

T1 - Central limit theorem as an approximation for intensity-based scoring function

AU - Sadygov, Rovshan

AU - Wohlschlegel, James

AU - Park, Sung Kyu

AU - Xu, Tao

AU - Yates, John R.

PY - 2006/1/1

Y1 - 2006/1/1

N2 - In this paper, we present an intensity-based probability function to identify peptides from tandem mass spectra and amino acid sequence databases. The function is an approximation to the central limiting theorem, and it explicitly depends on the cumulative product ion intensities, number of product ions of a peptide, and expectation value of the cumulative intensity. We compare the results of database searches using the new scoring function and scoring functions from earlier algorithms, which implement hypergeometric probability, Poisson's model, and cross-correlation scores. For a standard protein mixture (tandem mass spectra generated from the mixture of five known proteins), we generate receiver operating curves with all scoring schemes. The receiver operating curves show that the shared peaks count-based probability methods (like Poisson and hypergeometric models) are the most specific for matching high-quality tandem mass spectra. The intensity-based (central limit model) and intensity-modeled (cross-correlation) methods are more sensitive when matching low-quality tandem mass spectra, where the number of shared peaks is insufficient to correctly identify a peptide. Cross-correlation methods show a small advantage over the intensity-based probability method.

AB - In this paper, we present an intensity-based probability function to identify peptides from tandem mass spectra and amino acid sequence databases. The function is an approximation to the central limiting theorem, and it explicitly depends on the cumulative product ion intensities, number of product ions of a peptide, and expectation value of the cumulative intensity. We compare the results of database searches using the new scoring function and scoring functions from earlier algorithms, which implement hypergeometric probability, Poisson's model, and cross-correlation scores. For a standard protein mixture (tandem mass spectra generated from the mixture of five known proteins), we generate receiver operating curves with all scoring schemes. The receiver operating curves show that the shared peaks count-based probability methods (like Poisson and hypergeometric models) are the most specific for matching high-quality tandem mass spectra. The intensity-based (central limit model) and intensity-modeled (cross-correlation) methods are more sensitive when matching low-quality tandem mass spectra, where the number of shared peaks is insufficient to correctly identify a peptide. Cross-correlation methods show a small advantage over the intensity-based probability method.

UR - http://www.scopus.com/inward/record.url?scp=30044442277&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=30044442277&partnerID=8YFLogxK

U2 - 10.1021/ac051206r

DO - 10.1021/ac051206r

M3 - Article

VL - 78

SP - 89

EP - 95

JO - Analytical Chemistry

JF - Analytical Chemistry

SN - 0003-2700

IS - 1

ER -