TY - JOUR
T1 - Using SEQUEST with Theoretically Complete Sequence Databases
AU - Sadygov, Rovshan G.
N1 - Publisher Copyright:
© 2015 American Society for Mass Spectrometry.
PY - 2015/11/1
Y1 - 2015/11/1
N2 - SEQUEST has long been used to identify peptides/proteins from their tandem mass spectra and protein sequence databases. The algorithm has proven to be hugely successful for its sensitivity and specificity in identifying peptides/proteins, the sequences of which are present in the protein sequence databases. In this work, we report on work that attempts a new use for the algorithm by applying it to search a complete list of theoretically possible peptides, a de novo-like sequencing. We used freely available mass spectral data and determined a number of unique peptides as identified by SEQUEST. Using masses of these peptides and the mass accuracy of 0.001 Da, we have created a database of all theoretically possible peptide sequences corresponding to the precursor masses. We used our recently developed algorithm for determining all amino acid compositions corresponding to a mass interval, and used a lexicographic ordering to generate theoretical sequences from the compositions. The newly generated theoretical database was many-fold more complex than the original protein sequence database. We used SEQUEST to search and identify the best matches to the spectra from all theoretically possible peptide sequences. We found that SEQUEST cross-correlation score ranked the correct peptide match among the top sequence matches. The results testify to the high specificity of SEQUEST when combined with the high mass accuracy for intact peptides. [Figure not available: see fulltext.]
AB - SEQUEST has long been used to identify peptides/proteins from their tandem mass spectra and protein sequence databases. The algorithm has proven to be hugely successful for its sensitivity and specificity in identifying peptides/proteins, the sequences of which are present in the protein sequence databases. In this work, we report on work that attempts a new use for the algorithm by applying it to search a complete list of theoretically possible peptides, a de novo-like sequencing. We used freely available mass spectral data and determined a number of unique peptides as identified by SEQUEST. Using masses of these peptides and the mass accuracy of 0.001 Da, we have created a database of all theoretically possible peptide sequences corresponding to the precursor masses. We used our recently developed algorithm for determining all amino acid compositions corresponding to a mass interval, and used a lexicographic ordering to generate theoretical sequences from the compositions. The newly generated theoretical database was many-fold more complex than the original protein sequence database. We used SEQUEST to search and identify the best matches to the spectra from all theoretically possible peptide sequences. We found that SEQUEST cross-correlation score ranked the correct peptide match among the top sequence matches. The results testify to the high specificity of SEQUEST when combined with the high mass accuracy for intact peptides. [Figure not available: see fulltext.]
KW - All theoretically possible peptides
KW - De novo Peptide sequencing
KW - Mass distribution of peptides
KW - SEQUEST
UR - http://www.scopus.com/inward/record.url?scp=84958176451&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84958176451&partnerID=8YFLogxK
U2 - 10.1007/s13361-015-1228-5
DO - 10.1007/s13361-015-1228-5
M3 - Article
C2 - 26238326
AN - SCOPUS:84958176451
SN - 1044-0305
VL - 26
SP - 1858
EP - 1864
JO - Journal of the American Society for Mass Spectrometry
JF - Journal of the American Society for Mass Spectrometry
IS - 11
ER -