Cloud parallel processing of tandem mass spectrometry based proteomics data

Yassene Mohammed, Ekaterina Mostovenko, Alex A. Henneman, Rob J. Marissen, André M. Deelder, Magnus Palmblad

Research output: Contribution to journalArticle

26 Citations (Scopus)

Abstract

Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.

Original languageEnglish (US)
Pages (from-to)5101-5108
Number of pages8
JournalJournal of Proteome Research
Volume11
Issue number10
DOIs
StatePublished - Oct 5 2012
Externally publishedYes

Fingerprint

Search Engine
Search engines
Tandem Mass Spectrometry
Proteomics
Mass spectrometry
Workflow
Decomposition
Libraries
Processing
Databases
Cloud computing
Data acquisition
Mass Spectrometry
Software
Engines
Technology
Cloud Computing

Keywords

  • data decomposition
  • mass spectrometry
  • proteomics
  • scientific workflow

ASJC Scopus subject areas

  • Biochemistry
  • Chemistry(all)

Cite this

Mohammed, Y., Mostovenko, E., Henneman, A. A., Marissen, R. J., Deelder, A. M., & Palmblad, M. (2012). Cloud parallel processing of tandem mass spectrometry based proteomics data. Journal of Proteome Research, 11(10), 5101-5108. https://doi.org/10.1021/pr300561q

Cloud parallel processing of tandem mass spectrometry based proteomics data. / Mohammed, Yassene; Mostovenko, Ekaterina; Henneman, Alex A.; Marissen, Rob J.; Deelder, André M.; Palmblad, Magnus.

In: Journal of Proteome Research, Vol. 11, No. 10, 05.10.2012, p. 5101-5108.

Research output: Contribution to journalArticle

Mohammed, Y, Mostovenko, E, Henneman, AA, Marissen, RJ, Deelder, AM & Palmblad, M 2012, 'Cloud parallel processing of tandem mass spectrometry based proteomics data', Journal of Proteome Research, vol. 11, no. 10, pp. 5101-5108. https://doi.org/10.1021/pr300561q
Mohammed Y, Mostovenko E, Henneman AA, Marissen RJ, Deelder AM, Palmblad M. Cloud parallel processing of tandem mass spectrometry based proteomics data. Journal of Proteome Research. 2012 Oct 5;11(10):5101-5108. https://doi.org/10.1021/pr300561q
Mohammed, Yassene ; Mostovenko, Ekaterina ; Henneman, Alex A. ; Marissen, Rob J. ; Deelder, André M. ; Palmblad, Magnus. / Cloud parallel processing of tandem mass spectrometry based proteomics data. In: Journal of Proteome Research. 2012 ; Vol. 11, No. 10. pp. 5101-5108.
@article{a512d4e9554a404888f35beb119c6b8d,
title = "Cloud parallel processing of tandem mass spectrometry based proteomics data",
abstract = "Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.",
keywords = "data decomposition, mass spectrometry, proteomics, scientific workflow",
author = "Yassene Mohammed and Ekaterina Mostovenko and Henneman, {Alex A.} and Marissen, {Rob J.} and Deelder, {Andr{\'e} M.} and Magnus Palmblad",
year = "2012",
month = "10",
day = "5",
doi = "10.1021/pr300561q",
language = "English (US)",
volume = "11",
pages = "5101--5108",
journal = "Journal of Proteome Research",
issn = "1535-3893",
publisher = "American Chemical Society",
number = "10",

}

TY - JOUR

T1 - Cloud parallel processing of tandem mass spectrometry based proteomics data

AU - Mohammed, Yassene

AU - Mostovenko, Ekaterina

AU - Henneman, Alex A.

AU - Marissen, Rob J.

AU - Deelder, André M.

AU - Palmblad, Magnus

PY - 2012/10/5

Y1 - 2012/10/5

N2 - Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.

AB - Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.

KW - data decomposition

KW - mass spectrometry

KW - proteomics

KW - scientific workflow

UR - http://www.scopus.com/inward/record.url?scp=84867441831&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867441831&partnerID=8YFLogxK

U2 - 10.1021/pr300561q

DO - 10.1021/pr300561q

M3 - Article

VL - 11

SP - 5101

EP - 5108

JO - Journal of Proteome Research

JF - Journal of Proteome Research

SN - 1535-3893

IS - 10

ER -