Charger: Combination of signal processing and statistical learning algorithms for precursor charge-state determination from electron-transfer dissociation spectra

Rovshan Sadygov, Zhiqi Hao, Andreas F R Huhmer

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Tandem mass spectrometry in combination with liquid chromatography has emerged as a powerful tool for characterization of complex protein mixtures in a high-throughput manner. One of the bioinformatics challenges posed by the mass spectral data analysis is the determination of precursor charge when unit mass resolution is used for detecting fragment ions. The charge-state information is used to filter database sequences before they are correlated to experimental data. In the absence of the accurate charge state, several charge states are assumed. This dramatically increases database search times. To address this problem, we have developed an approach for charge-state determination of peptides from their tandem mass spectra obtained in fragmentations via electrontransfer dissociation (ETD) reactions. Protein analysis by ETD is thought to enhance the range of amino acid sequences that can be analyzed by mass spectrometry-based proteomics. One example is the improved capability to characterize phosphorylated peptides. Our approach to charge-state determination uses a combination of signal processing and statistical machine learning. The signal processing employs correlation and convolution analyses to determine precursor masses and charge states of peptides. We discuss applicability of these methods to spectra of different charge states. We note that in our applications correlation analysis outperforms the convolution in determining peptide charge states. The correlation analysis is best suited for spectra with prevalence of complementary ions. It is highly specific but is dependent on quality of spectra. The linear discriminant analysis (LDA) approach uses a number of other spectral features to predict charge states. We train LDA classifier on a set of manually curated spectral data from a mixture of proteins of known identity. There are over 5000 spectra in the training set. A number of features, pertinent to spectra of peptides obtained via ETD reactions, have been used in the training. The loading coefficients of LDA indicate the relative importance of different features for charge-state determination. We have applied our model to a test data set generated from a mixture of 49 proteins. We search the spectra with and without use of the charge-state determination. The charge-state determination helps to significantly save the database search times. We discuss the cost associated with the possible misclassification of charge states.

Original languageEnglish (US)
Pages (from-to)376-386
Number of pages11
JournalAnalytical Chemistry
Volume80
Issue number2
DOIs
StatePublished - Jan 15 2008
Externally publishedYes

Fingerprint

Learning algorithms
Signal processing
Discriminant analysis
Peptides
Electrons
Convolution
Mass spectrometry
Proteins
Ions
Liquid chromatography
Bioinformatics
Learning systems
Classifiers
Throughput
Amino Acids
Costs

ASJC Scopus subject areas

  • Analytical Chemistry

Cite this

Charger : Combination of signal processing and statistical learning algorithms for precursor charge-state determination from electron-transfer dissociation spectra. / Sadygov, Rovshan; Hao, Zhiqi; Huhmer, Andreas F R.

In: Analytical Chemistry, Vol. 80, No. 2, 15.01.2008, p. 376-386.

Research output: Contribution to journalArticle

@article{64e17d2208774899a3bfb3592eb80606,
title = "Charger: Combination of signal processing and statistical learning algorithms for precursor charge-state determination from electron-transfer dissociation spectra",
abstract = "Tandem mass spectrometry in combination with liquid chromatography has emerged as a powerful tool for characterization of complex protein mixtures in a high-throughput manner. One of the bioinformatics challenges posed by the mass spectral data analysis is the determination of precursor charge when unit mass resolution is used for detecting fragment ions. The charge-state information is used to filter database sequences before they are correlated to experimental data. In the absence of the accurate charge state, several charge states are assumed. This dramatically increases database search times. To address this problem, we have developed an approach for charge-state determination of peptides from their tandem mass spectra obtained in fragmentations via electrontransfer dissociation (ETD) reactions. Protein analysis by ETD is thought to enhance the range of amino acid sequences that can be analyzed by mass spectrometry-based proteomics. One example is the improved capability to characterize phosphorylated peptides. Our approach to charge-state determination uses a combination of signal processing and statistical machine learning. The signal processing employs correlation and convolution analyses to determine precursor masses and charge states of peptides. We discuss applicability of these methods to spectra of different charge states. We note that in our applications correlation analysis outperforms the convolution in determining peptide charge states. The correlation analysis is best suited for spectra with prevalence of complementary ions. It is highly specific but is dependent on quality of spectra. The linear discriminant analysis (LDA) approach uses a number of other spectral features to predict charge states. We train LDA classifier on a set of manually curated spectral data from a mixture of proteins of known identity. There are over 5000 spectra in the training set. A number of features, pertinent to spectra of peptides obtained via ETD reactions, have been used in the training. The loading coefficients of LDA indicate the relative importance of different features for charge-state determination. We have applied our model to a test data set generated from a mixture of 49 proteins. We search the spectra with and without use of the charge-state determination. The charge-state determination helps to significantly save the database search times. We discuss the cost associated with the possible misclassification of charge states.",
author = "Rovshan Sadygov and Zhiqi Hao and Huhmer, {Andreas F R}",
year = "2008",
month = "1",
day = "15",
doi = "10.1021/ac071332q",
language = "English (US)",
volume = "80",
pages = "376--386",
journal = "Analytical Chemistry",
issn = "0003-2700",
publisher = "American Chemical Society",
number = "2",

}

TY - JOUR

T1 - Charger

T2 - Combination of signal processing and statistical learning algorithms for precursor charge-state determination from electron-transfer dissociation spectra

AU - Sadygov, Rovshan

AU - Hao, Zhiqi

AU - Huhmer, Andreas F R

PY - 2008/1/15

Y1 - 2008/1/15

N2 - Tandem mass spectrometry in combination with liquid chromatography has emerged as a powerful tool for characterization of complex protein mixtures in a high-throughput manner. One of the bioinformatics challenges posed by the mass spectral data analysis is the determination of precursor charge when unit mass resolution is used for detecting fragment ions. The charge-state information is used to filter database sequences before they are correlated to experimental data. In the absence of the accurate charge state, several charge states are assumed. This dramatically increases database search times. To address this problem, we have developed an approach for charge-state determination of peptides from their tandem mass spectra obtained in fragmentations via electrontransfer dissociation (ETD) reactions. Protein analysis by ETD is thought to enhance the range of amino acid sequences that can be analyzed by mass spectrometry-based proteomics. One example is the improved capability to characterize phosphorylated peptides. Our approach to charge-state determination uses a combination of signal processing and statistical machine learning. The signal processing employs correlation and convolution analyses to determine precursor masses and charge states of peptides. We discuss applicability of these methods to spectra of different charge states. We note that in our applications correlation analysis outperforms the convolution in determining peptide charge states. The correlation analysis is best suited for spectra with prevalence of complementary ions. It is highly specific but is dependent on quality of spectra. The linear discriminant analysis (LDA) approach uses a number of other spectral features to predict charge states. We train LDA classifier on a set of manually curated spectral data from a mixture of proteins of known identity. There are over 5000 spectra in the training set. A number of features, pertinent to spectra of peptides obtained via ETD reactions, have been used in the training. The loading coefficients of LDA indicate the relative importance of different features for charge-state determination. We have applied our model to a test data set generated from a mixture of 49 proteins. We search the spectra with and without use of the charge-state determination. The charge-state determination helps to significantly save the database search times. We discuss the cost associated with the possible misclassification of charge states.

AB - Tandem mass spectrometry in combination with liquid chromatography has emerged as a powerful tool for characterization of complex protein mixtures in a high-throughput manner. One of the bioinformatics challenges posed by the mass spectral data analysis is the determination of precursor charge when unit mass resolution is used for detecting fragment ions. The charge-state information is used to filter database sequences before they are correlated to experimental data. In the absence of the accurate charge state, several charge states are assumed. This dramatically increases database search times. To address this problem, we have developed an approach for charge-state determination of peptides from their tandem mass spectra obtained in fragmentations via electrontransfer dissociation (ETD) reactions. Protein analysis by ETD is thought to enhance the range of amino acid sequences that can be analyzed by mass spectrometry-based proteomics. One example is the improved capability to characterize phosphorylated peptides. Our approach to charge-state determination uses a combination of signal processing and statistical machine learning. The signal processing employs correlation and convolution analyses to determine precursor masses and charge states of peptides. We discuss applicability of these methods to spectra of different charge states. We note that in our applications correlation analysis outperforms the convolution in determining peptide charge states. The correlation analysis is best suited for spectra with prevalence of complementary ions. It is highly specific but is dependent on quality of spectra. The linear discriminant analysis (LDA) approach uses a number of other spectral features to predict charge states. We train LDA classifier on a set of manually curated spectral data from a mixture of proteins of known identity. There are over 5000 spectra in the training set. A number of features, pertinent to spectra of peptides obtained via ETD reactions, have been used in the training. The loading coefficients of LDA indicate the relative importance of different features for charge-state determination. We have applied our model to a test data set generated from a mixture of 49 proteins. We search the spectra with and without use of the charge-state determination. The charge-state determination helps to significantly save the database search times. We discuss the cost associated with the possible misclassification of charge states.

UR - http://www.scopus.com/inward/record.url?scp=38349104744&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38349104744&partnerID=8YFLogxK

U2 - 10.1021/ac071332q

DO - 10.1021/ac071332q

M3 - Article

C2 - 18081262

AN - SCOPUS:38349104744

VL - 80

SP - 376

EP - 386

JO - Analytical Chemistry

JF - Analytical Chemistry

SN - 0003-2700

IS - 2

ER -