A model for random sampling and estimation of relative protein abundance in shotgun proteomics

Hongbin Liu, Rovshan Sadygov, John R. Yates

Research output: Contribution to journalArticle

1800 Citations (Scopus)

Abstract

Proteomic analysis of complex protein mixtures using proteolytic digestion and liquid chromatography in combination with tandem mass spectrometry is a standard approach in biological studies. Data-dependent acquisition is used to automatically acquire tandem mass spectra of peptides eluting into the mass spectrometer. In more complicated mixtures, for example, whole cell lysates, data-dependent acquisition incompletely samples among the peptide ions present rather than acquiring tandem mass spectra for all ions available. We analyzed the sampling process and developed a statistical model to accurately predict the level of sampling expected for mixtures of a specific complexity. The model also predicts how many analyses are required for saturated sampling of a complex protein mixture. For a yeast-soluble cell lysate 10 analyses are required to reach a 95% saturation level on protein identifications based on our model. The statistical model also suggests a relationship between the level of sampling observed for a protein and the relative abundance of the protein in the mixture. We demonstrate a linear dynamic range over 2 orders of magnitude by using the number of spectra (spectral sampling) acquired for each protein.

Original languageEnglish (US)
Pages (from-to)4193-4201
Number of pages9
JournalAnalytical Chemistry
Volume76
Issue number14
DOIs
StatePublished - Jul 15 2004
Externally publishedYes

Fingerprint

Sampling
Proteins
Ions
Peptides
Liquid chromatography
Mass spectrometers
Yeast
Mass spectrometry
Proteomics
Identification (control systems)
Statistical Models

ASJC Scopus subject areas

  • Analytical Chemistry

Cite this

A model for random sampling and estimation of relative protein abundance in shotgun proteomics. / Liu, Hongbin; Sadygov, Rovshan; Yates, John R.

In: Analytical Chemistry, Vol. 76, No. 14, 15.07.2004, p. 4193-4201.

Research output: Contribution to journalArticle

@article{2f2cd72173304600ae974361a33f2f6b,
title = "A model for random sampling and estimation of relative protein abundance in shotgun proteomics",
abstract = "Proteomic analysis of complex protein mixtures using proteolytic digestion and liquid chromatography in combination with tandem mass spectrometry is a standard approach in biological studies. Data-dependent acquisition is used to automatically acquire tandem mass spectra of peptides eluting into the mass spectrometer. In more complicated mixtures, for example, whole cell lysates, data-dependent acquisition incompletely samples among the peptide ions present rather than acquiring tandem mass spectra for all ions available. We analyzed the sampling process and developed a statistical model to accurately predict the level of sampling expected for mixtures of a specific complexity. The model also predicts how many analyses are required for saturated sampling of a complex protein mixture. For a yeast-soluble cell lysate 10 analyses are required to reach a 95{\%} saturation level on protein identifications based on our model. The statistical model also suggests a relationship between the level of sampling observed for a protein and the relative abundance of the protein in the mixture. We demonstrate a linear dynamic range over 2 orders of magnitude by using the number of spectra (spectral sampling) acquired for each protein.",
author = "Hongbin Liu and Rovshan Sadygov and Yates, {John R.}",
year = "2004",
month = "7",
day = "15",
doi = "10.1021/ac0498563",
language = "English (US)",
volume = "76",
pages = "4193--4201",
journal = "Analytical Chemistry",
issn = "0003-2700",
publisher = "American Chemical Society",
number = "14",

}

TY - JOUR

T1 - A model for random sampling and estimation of relative protein abundance in shotgun proteomics

AU - Liu, Hongbin

AU - Sadygov, Rovshan

AU - Yates, John R.

PY - 2004/7/15

Y1 - 2004/7/15

N2 - Proteomic analysis of complex protein mixtures using proteolytic digestion and liquid chromatography in combination with tandem mass spectrometry is a standard approach in biological studies. Data-dependent acquisition is used to automatically acquire tandem mass spectra of peptides eluting into the mass spectrometer. In more complicated mixtures, for example, whole cell lysates, data-dependent acquisition incompletely samples among the peptide ions present rather than acquiring tandem mass spectra for all ions available. We analyzed the sampling process and developed a statistical model to accurately predict the level of sampling expected for mixtures of a specific complexity. The model also predicts how many analyses are required for saturated sampling of a complex protein mixture. For a yeast-soluble cell lysate 10 analyses are required to reach a 95% saturation level on protein identifications based on our model. The statistical model also suggests a relationship between the level of sampling observed for a protein and the relative abundance of the protein in the mixture. We demonstrate a linear dynamic range over 2 orders of magnitude by using the number of spectra (spectral sampling) acquired for each protein.

AB - Proteomic analysis of complex protein mixtures using proteolytic digestion and liquid chromatography in combination with tandem mass spectrometry is a standard approach in biological studies. Data-dependent acquisition is used to automatically acquire tandem mass spectra of peptides eluting into the mass spectrometer. In more complicated mixtures, for example, whole cell lysates, data-dependent acquisition incompletely samples among the peptide ions present rather than acquiring tandem mass spectra for all ions available. We analyzed the sampling process and developed a statistical model to accurately predict the level of sampling expected for mixtures of a specific complexity. The model also predicts how many analyses are required for saturated sampling of a complex protein mixture. For a yeast-soluble cell lysate 10 analyses are required to reach a 95% saturation level on protein identifications based on our model. The statistical model also suggests a relationship between the level of sampling observed for a protein and the relative abundance of the protein in the mixture. We demonstrate a linear dynamic range over 2 orders of magnitude by using the number of spectra (spectral sampling) acquired for each protein.

UR - http://www.scopus.com/inward/record.url?scp=3242731195&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=3242731195&partnerID=8YFLogxK

U2 - 10.1021/ac0498563

DO - 10.1021/ac0498563

M3 - Article

VL - 76

SP - 4193

EP - 4201

JO - Analytical Chemistry

JF - Analytical Chemistry

SN - 0003-2700

IS - 14

ER -