Using statistical properties of short subsequences in microbial identification

Sergei Chumakov, Catherine Putonti, Bernard Pettitt, George Fox, Richard C. Willson, Yuriy Fofanov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The comparative analysis of distributions of the presence/absence of short subsequences of different length ("n-mers", n = 5 - 20) in more than 100 microbial genomes has been performed. Our results show that for organisms, which are not close relatives of each other, the presence/absence of different 10-20-mers in their genomes are not correlated. For close biological relatives, some correlation of the presence of n-mers appears, but is not as strong as expected. Suppressed correlations among the n-mers present in different genomes lead to the possibility of using random sets of n-mers (with appropriately chosen n) to discriminate genomes of different organisms with a low probability of error. We have performed in silico experiments to demonstrate that the presence/absence pattern of 1000 random oligomers of length 12-13 in a bacterial genome is sufficiently characteristic to readily and unambiguously distinguish any known bacterial genome from any other.

Original languageEnglish (US)
Title of host publicationProceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS'04
EditorsF. Valafar, H. Valafar
Pages363-367
Number of pages5
StatePublished - 2004
Externally publishedYes
EventProceedings of the International Conference on Mathematics and Engineering Techniques in medicine and Biological Sciences, METMBS'04 - Las Vegas, NV, United States
Duration: Jun 21 2004Jun 24 2004

Other

OtherProceedings of the International Conference on Mathematics and Engineering Techniques in medicine and Biological Sciences, METMBS'04
CountryUnited States
CityLas Vegas, NV
Period6/21/046/24/04

Fingerprint

Genes
Oligomers
Experiments

Keywords

  • Microarray
  • Pathogen identification

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Chumakov, S., Putonti, C., Pettitt, B., Fox, G., Willson, R. C., & Fofanov, Y. (2004). Using statistical properties of short subsequences in microbial identification. In F. Valafar, & H. Valafar (Eds.), Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS'04 (pp. 363-367)

Using statistical properties of short subsequences in microbial identification. / Chumakov, Sergei; Putonti, Catherine; Pettitt, Bernard; Fox, George; Willson, Richard C.; Fofanov, Yuriy.

Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS'04. ed. / F. Valafar; H. Valafar. 2004. p. 363-367.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chumakov, S, Putonti, C, Pettitt, B, Fox, G, Willson, RC & Fofanov, Y 2004, Using statistical properties of short subsequences in microbial identification. in F Valafar & H Valafar (eds), Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS'04. pp. 363-367, Proceedings of the International Conference on Mathematics and Engineering Techniques in medicine and Biological Sciences, METMBS'04, Las Vegas, NV, United States, 6/21/04.
Chumakov S, Putonti C, Pettitt B, Fox G, Willson RC, Fofanov Y. Using statistical properties of short subsequences in microbial identification. In Valafar F, Valafar H, editors, Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS'04. 2004. p. 363-367
Chumakov, Sergei ; Putonti, Catherine ; Pettitt, Bernard ; Fox, George ; Willson, Richard C. ; Fofanov, Yuriy. / Using statistical properties of short subsequences in microbial identification. Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS'04. editor / F. Valafar ; H. Valafar. 2004. pp. 363-367
@inproceedings{c465416d5888428c8db872bdd9563ef1,
title = "Using statistical properties of short subsequences in microbial identification",
abstract = "The comparative analysis of distributions of the presence/absence of short subsequences of different length ({"}n-mers{"}, n = 5 - 20) in more than 100 microbial genomes has been performed. Our results show that for organisms, which are not close relatives of each other, the presence/absence of different 10-20-mers in their genomes are not correlated. For close biological relatives, some correlation of the presence of n-mers appears, but is not as strong as expected. Suppressed correlations among the n-mers present in different genomes lead to the possibility of using random sets of n-mers (with appropriately chosen n) to discriminate genomes of different organisms with a low probability of error. We have performed in silico experiments to demonstrate that the presence/absence pattern of 1000 random oligomers of length 12-13 in a bacterial genome is sufficiently characteristic to readily and unambiguously distinguish any known bacterial genome from any other.",
keywords = "Microarray, Pathogen identification",
author = "Sergei Chumakov and Catherine Putonti and Bernard Pettitt and George Fox and Willson, {Richard C.} and Yuriy Fofanov",
year = "2004",
language = "English (US)",
isbn = "1932415432",
pages = "363--367",
editor = "F. Valafar and H. Valafar",
booktitle = "Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS'04",

}

TY - GEN

T1 - Using statistical properties of short subsequences in microbial identification

AU - Chumakov, Sergei

AU - Putonti, Catherine

AU - Pettitt, Bernard

AU - Fox, George

AU - Willson, Richard C.

AU - Fofanov, Yuriy

PY - 2004

Y1 - 2004

N2 - The comparative analysis of distributions of the presence/absence of short subsequences of different length ("n-mers", n = 5 - 20) in more than 100 microbial genomes has been performed. Our results show that for organisms, which are not close relatives of each other, the presence/absence of different 10-20-mers in their genomes are not correlated. For close biological relatives, some correlation of the presence of n-mers appears, but is not as strong as expected. Suppressed correlations among the n-mers present in different genomes lead to the possibility of using random sets of n-mers (with appropriately chosen n) to discriminate genomes of different organisms with a low probability of error. We have performed in silico experiments to demonstrate that the presence/absence pattern of 1000 random oligomers of length 12-13 in a bacterial genome is sufficiently characteristic to readily and unambiguously distinguish any known bacterial genome from any other.

AB - The comparative analysis of distributions of the presence/absence of short subsequences of different length ("n-mers", n = 5 - 20) in more than 100 microbial genomes has been performed. Our results show that for organisms, which are not close relatives of each other, the presence/absence of different 10-20-mers in their genomes are not correlated. For close biological relatives, some correlation of the presence of n-mers appears, but is not as strong as expected. Suppressed correlations among the n-mers present in different genomes lead to the possibility of using random sets of n-mers (with appropriately chosen n) to discriminate genomes of different organisms with a low probability of error. We have performed in silico experiments to demonstrate that the presence/absence pattern of 1000 random oligomers of length 12-13 in a bacterial genome is sufficiently characteristic to readily and unambiguously distinguish any known bacterial genome from any other.

KW - Microarray

KW - Pathogen identification

UR - http://www.scopus.com/inward/record.url?scp=11144300816&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=11144300816&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:11144300816

SN - 1932415432

SN - 9781932415438

SP - 363

EP - 367

BT - Proceedings of the International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences, METMBS'04

A2 - Valafar, F.

A2 - Valafar, H.

ER -