Partially sequenced organisms, decoy searches and false discovery rates

Bjorn Victor, Sarah Gabriël, Kirezi Kanobana, Ekaterina Mostovenko, Katja Polman, Pierre Dorny, André M. Deelder, Magnus Palmblad

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Tandem mass spectrometry is commonly used to identify peptides, typically by comparing their product ion spectra with those predicted from a protein sequence database and scoring these matches. The most reported quality metric for a set of peptide identifications is the false discovery rate (FDR), the fraction of expected false identifications in the set. This metric has so far only been used for completely sequenced organisms or known protein mixtures. We have investigated whether FDR estimations are also applicable in the case of partially sequenced organisms, where many high-quality spectra fail to identify the correct peptides because the latter are not present in the searched sequence database. Using real data from human plasma and simulated partial sequence databases derived from two complete human sequence databases with different levels of redundancy, we could demonstrate that the mixture model approach in PeptideProphet is robust for partial databases, particularly if used in combination with decoy sequences. We therefore recommend using this method when estimating the FDR and reporting peptide identifications from incompletely sequenced organisms.

Original languageEnglish (US)
Pages (from-to)1991-1995
Number of pages5
JournalJournal of Proteome Research
Volume11
Issue number3
DOIs
StatePublished - Mar 2 2012
Externally publishedYes

Fingerprint

Databases
Peptides
Protein Databases
Plasma (human)
Tandem Mass Spectrometry
Mass spectrometry
Redundancy
Ions
Proteins

Keywords

  • false discovery rates
  • mixture models
  • partially sequenced organism
  • PeptideProphet

ASJC Scopus subject areas

  • Biochemistry
  • Chemistry(all)

Cite this

Victor, B., Gabriël, S., Kanobana, K., Mostovenko, E., Polman, K., Dorny, P., ... Palmblad, M. (2012). Partially sequenced organisms, decoy searches and false discovery rates. Journal of Proteome Research, 11(3), 1991-1995. https://doi.org/10.1021/pr201035r

Partially sequenced organisms, decoy searches and false discovery rates. / Victor, Bjorn; Gabriël, Sarah; Kanobana, Kirezi; Mostovenko, Ekaterina; Polman, Katja; Dorny, Pierre; Deelder, André M.; Palmblad, Magnus.

In: Journal of Proteome Research, Vol. 11, No. 3, 02.03.2012, p. 1991-1995.

Research output: Contribution to journalArticle

Victor, B, Gabriël, S, Kanobana, K, Mostovenko, E, Polman, K, Dorny, P, Deelder, AM & Palmblad, M 2012, 'Partially sequenced organisms, decoy searches and false discovery rates', Journal of Proteome Research, vol. 11, no. 3, pp. 1991-1995. https://doi.org/10.1021/pr201035r
Victor B, Gabriël S, Kanobana K, Mostovenko E, Polman K, Dorny P et al. Partially sequenced organisms, decoy searches and false discovery rates. Journal of Proteome Research. 2012 Mar 2;11(3):1991-1995. https://doi.org/10.1021/pr201035r
Victor, Bjorn ; Gabriël, Sarah ; Kanobana, Kirezi ; Mostovenko, Ekaterina ; Polman, Katja ; Dorny, Pierre ; Deelder, André M. ; Palmblad, Magnus. / Partially sequenced organisms, decoy searches and false discovery rates. In: Journal of Proteome Research. 2012 ; Vol. 11, No. 3. pp. 1991-1995.
@article{653cf880daed4684a87096948e3e14ca,
title = "Partially sequenced organisms, decoy searches and false discovery rates",
abstract = "Tandem mass spectrometry is commonly used to identify peptides, typically by comparing their product ion spectra with those predicted from a protein sequence database and scoring these matches. The most reported quality metric for a set of peptide identifications is the false discovery rate (FDR), the fraction of expected false identifications in the set. This metric has so far only been used for completely sequenced organisms or known protein mixtures. We have investigated whether FDR estimations are also applicable in the case of partially sequenced organisms, where many high-quality spectra fail to identify the correct peptides because the latter are not present in the searched sequence database. Using real data from human plasma and simulated partial sequence databases derived from two complete human sequence databases with different levels of redundancy, we could demonstrate that the mixture model approach in PeptideProphet is robust for partial databases, particularly if used in combination with decoy sequences. We therefore recommend using this method when estimating the FDR and reporting peptide identifications from incompletely sequenced organisms.",
keywords = "false discovery rates, mixture models, partially sequenced organism, PeptideProphet",
author = "Bjorn Victor and Sarah Gabri{\"e}l and Kirezi Kanobana and Ekaterina Mostovenko and Katja Polman and Pierre Dorny and Deelder, {Andr{\'e} M.} and Magnus Palmblad",
year = "2012",
month = "3",
day = "2",
doi = "10.1021/pr201035r",
language = "English (US)",
volume = "11",
pages = "1991--1995",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "3",

}

TY - JOUR

T1 - Partially sequenced organisms, decoy searches and false discovery rates

AU - Victor, Bjorn

AU - Gabriël, Sarah

AU - Kanobana, Kirezi

AU - Mostovenko, Ekaterina

AU - Polman, Katja

AU - Dorny, Pierre

AU - Deelder, André M.

AU - Palmblad, Magnus

PY - 2012/3/2

Y1 - 2012/3/2

N2 - Tandem mass spectrometry is commonly used to identify peptides, typically by comparing their product ion spectra with those predicted from a protein sequence database and scoring these matches. The most reported quality metric for a set of peptide identifications is the false discovery rate (FDR), the fraction of expected false identifications in the set. This metric has so far only been used for completely sequenced organisms or known protein mixtures. We have investigated whether FDR estimations are also applicable in the case of partially sequenced organisms, where many high-quality spectra fail to identify the correct peptides because the latter are not present in the searched sequence database. Using real data from human plasma and simulated partial sequence databases derived from two complete human sequence databases with different levels of redundancy, we could demonstrate that the mixture model approach in PeptideProphet is robust for partial databases, particularly if used in combination with decoy sequences. We therefore recommend using this method when estimating the FDR and reporting peptide identifications from incompletely sequenced organisms.

AB - Tandem mass spectrometry is commonly used to identify peptides, typically by comparing their product ion spectra with those predicted from a protein sequence database and scoring these matches. The most reported quality metric for a set of peptide identifications is the false discovery rate (FDR), the fraction of expected false identifications in the set. This metric has so far only been used for completely sequenced organisms or known protein mixtures. We have investigated whether FDR estimations are also applicable in the case of partially sequenced organisms, where many high-quality spectra fail to identify the correct peptides because the latter are not present in the searched sequence database. Using real data from human plasma and simulated partial sequence databases derived from two complete human sequence databases with different levels of redundancy, we could demonstrate that the mixture model approach in PeptideProphet is robust for partial databases, particularly if used in combination with decoy sequences. We therefore recommend using this method when estimating the FDR and reporting peptide identifications from incompletely sequenced organisms.

KW - false discovery rates

KW - mixture models

KW - partially sequenced organism

KW - PeptideProphet

UR - http://www.scopus.com/inward/record.url?scp=84857891982&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857891982&partnerID=8YFLogxK

U2 - 10.1021/pr201035r

DO - 10.1021/pr201035r

M3 - Article

C2 - 22339108

AN - SCOPUS:84857891982

VL - 11

SP - 1991

EP - 1995

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 3

ER -