Using SEQUEST with Theoretically Complete Sequence Databases

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


SEQUEST has long been used to identify peptides/proteins from their tandem mass spectra and protein sequence databases. The algorithm has proven to be hugely successful for its sensitivity and specificity in identifying peptides/proteins, the sequences of which are present in the protein sequence databases. In this work, we report on work that attempts a new use for the algorithm by applying it to search a complete list of theoretically possible peptides, a de novo-like sequencing. We used freely available mass spectral data and determined a number of unique peptides as identified by SEQUEST. Using masses of these peptides and the mass accuracy of 0.001 Da, we have created a database of all theoretically possible peptide sequences corresponding to the precursor masses. We used our recently developed algorithm for determining all amino acid compositions corresponding to a mass interval, and used a lexicographic ordering to generate theoretical sequences from the compositions. The newly generated theoretical database was many-fold more complex than the original protein sequence database. We used SEQUEST to search and identify the best matches to the spectra from all theoretically possible peptide sequences. We found that SEQUEST cross-correlation score ranked the correct peptide match among the top sequence matches. The results testify to the high specificity of SEQUEST when combined with the high mass accuracy for intact peptides. [Figure not available: see fulltext.]

Original languageEnglish (US)
Pages (from-to)1858-1864
Number of pages7
JournalJournal of the American Society for Mass Spectrometry
Issue number11
StatePublished - Nov 1 2015


  • All theoretically possible peptides
  • De novo Peptide sequencing
  • Mass distribution of peptides

ASJC Scopus subject areas

  • Structural Biology
  • Spectroscopy


Dive into the research topics of 'Using SEQUEST with Theoretically Complete Sequence Databases'. Together they form a unique fingerprint.

Cite this