A parallel method for enumerating amino acid compositions and masses of all theoretical peptides

Alexey V. Nefedov, Rovshan Sadygov

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Background: Enumeration of all theoretically possible amino acid compositions is an important problem in several proteomics workflows, including peptide mass fingerprinting, mass defect labeling, mass defect filtering, and de novo peptide sequencing. Because of the high computational complexity of this task, reported methods for peptide enumeration were restricted to cover limited mass ranges (below 2 kDa). In addition, implementation details of these methods as well as their computational performance have not been provided. The increasing availability of parallel (multi-core) computers in all fields of research makes the development of parallel methods for peptide enumeration a timely topic.Results: We describe a parallel method for enumerating all amino acid compositions up to a given length. We present recursive procedures which are at the core of the method, and show that a single task of enumeration of all peptide compositions can be divided into smaller subtasks that can be executed in parallel. The computational complexity of the subtasks is compared with the computational complexity of the whole task. Pseudocodes of processes (a master and workers) that are used to execute the enumerating procedure in parallel are given. We present computational times for our method executed on a computer cluster with 12 Intel Xeon X5650 CPUs (72 cores) running Windows HPC Server. Our method has been implemented as a 32- and 64-bit Windows application using Microsoft Visual C++ and the Message Passing Interface. It is available for download at https://ispace.utmb.edu/users/rgsadygo/Proteomics/ParallelMethod.Conclusion: We describe implementation of a parallel method for generating mass distributions of all theoretically possible amino acid compositions.

Original languageEnglish (US)
Article number432
JournalBMC Bioinformatics
Volume12
DOIs
StatePublished - Nov 7 2011

Fingerprint

Parallel Methods
Peptides
Amino Acids
Amino acids
Enumeration
Computational complexity
Chemical analysis
Computational Complexity
Proteomics
Defects
Message Passing Interface
Fingerprinting
Message passing
Computational methods
C++
Labeling
Sequencing
Work Flow
Program processors
Servers

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

A parallel method for enumerating amino acid compositions and masses of all theoretical peptides. / Nefedov, Alexey V.; Sadygov, Rovshan.

In: BMC Bioinformatics, Vol. 12, 432, 07.11.2011.

Research output: Contribution to journalArticle

@article{6bdfdbd1a39e4077b2ffd353ac8a35d8,
title = "A parallel method for enumerating amino acid compositions and masses of all theoretical peptides",
abstract = "Background: Enumeration of all theoretically possible amino acid compositions is an important problem in several proteomics workflows, including peptide mass fingerprinting, mass defect labeling, mass defect filtering, and de novo peptide sequencing. Because of the high computational complexity of this task, reported methods for peptide enumeration were restricted to cover limited mass ranges (below 2 kDa). In addition, implementation details of these methods as well as their computational performance have not been provided. The increasing availability of parallel (multi-core) computers in all fields of research makes the development of parallel methods for peptide enumeration a timely topic.Results: We describe a parallel method for enumerating all amino acid compositions up to a given length. We present recursive procedures which are at the core of the method, and show that a single task of enumeration of all peptide compositions can be divided into smaller subtasks that can be executed in parallel. The computational complexity of the subtasks is compared with the computational complexity of the whole task. Pseudocodes of processes (a master and workers) that are used to execute the enumerating procedure in parallel are given. We present computational times for our method executed on a computer cluster with 12 Intel Xeon X5650 CPUs (72 cores) running Windows HPC Server. Our method has been implemented as a 32- and 64-bit Windows application using Microsoft Visual C++ and the Message Passing Interface. It is available for download at https://ispace.utmb.edu/users/rgsadygo/Proteomics/ParallelMethod.Conclusion: We describe implementation of a parallel method for generating mass distributions of all theoretically possible amino acid compositions.",
author = "Nefedov, {Alexey V.} and Rovshan Sadygov",
year = "2011",
month = "11",
day = "7",
doi = "10.1186/1471-2105-12-432",
language = "English (US)",
volume = "12",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - A parallel method for enumerating amino acid compositions and masses of all theoretical peptides

AU - Nefedov, Alexey V.

AU - Sadygov, Rovshan

PY - 2011/11/7

Y1 - 2011/11/7

N2 - Background: Enumeration of all theoretically possible amino acid compositions is an important problem in several proteomics workflows, including peptide mass fingerprinting, mass defect labeling, mass defect filtering, and de novo peptide sequencing. Because of the high computational complexity of this task, reported methods for peptide enumeration were restricted to cover limited mass ranges (below 2 kDa). In addition, implementation details of these methods as well as their computational performance have not been provided. The increasing availability of parallel (multi-core) computers in all fields of research makes the development of parallel methods for peptide enumeration a timely topic.Results: We describe a parallel method for enumerating all amino acid compositions up to a given length. We present recursive procedures which are at the core of the method, and show that a single task of enumeration of all peptide compositions can be divided into smaller subtasks that can be executed in parallel. The computational complexity of the subtasks is compared with the computational complexity of the whole task. Pseudocodes of processes (a master and workers) that are used to execute the enumerating procedure in parallel are given. We present computational times for our method executed on a computer cluster with 12 Intel Xeon X5650 CPUs (72 cores) running Windows HPC Server. Our method has been implemented as a 32- and 64-bit Windows application using Microsoft Visual C++ and the Message Passing Interface. It is available for download at https://ispace.utmb.edu/users/rgsadygo/Proteomics/ParallelMethod.Conclusion: We describe implementation of a parallel method for generating mass distributions of all theoretically possible amino acid compositions.

AB - Background: Enumeration of all theoretically possible amino acid compositions is an important problem in several proteomics workflows, including peptide mass fingerprinting, mass defect labeling, mass defect filtering, and de novo peptide sequencing. Because of the high computational complexity of this task, reported methods for peptide enumeration were restricted to cover limited mass ranges (below 2 kDa). In addition, implementation details of these methods as well as their computational performance have not been provided. The increasing availability of parallel (multi-core) computers in all fields of research makes the development of parallel methods for peptide enumeration a timely topic.Results: We describe a parallel method for enumerating all amino acid compositions up to a given length. We present recursive procedures which are at the core of the method, and show that a single task of enumeration of all peptide compositions can be divided into smaller subtasks that can be executed in parallel. The computational complexity of the subtasks is compared with the computational complexity of the whole task. Pseudocodes of processes (a master and workers) that are used to execute the enumerating procedure in parallel are given. We present computational times for our method executed on a computer cluster with 12 Intel Xeon X5650 CPUs (72 cores) running Windows HPC Server. Our method has been implemented as a 32- and 64-bit Windows application using Microsoft Visual C++ and the Message Passing Interface. It is available for download at https://ispace.utmb.edu/users/rgsadygo/Proteomics/ParallelMethod.Conclusion: We describe implementation of a parallel method for generating mass distributions of all theoretically possible amino acid compositions.

UR - http://www.scopus.com/inward/record.url?scp=80355132686&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80355132686&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-12-432

DO - 10.1186/1471-2105-12-432

M3 - Article

C2 - 22059886

AN - SCOPUS:80355132686

VL - 12

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 432

ER -