Using mutual information to discover temporal patterns in gene expression data

Sergei Chumakov, Efren Ballesteros, Jorge E. Rodriguez Sanchez, Arturo Chavez, Meizhuo Zhang, Bernard Pettitt, Yuriy Fofanov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Finding relations among gene expressions involves the definition of the similarity between experimental data. A simplest similarity measure is the Correlation Coefficient. It is able to identify linear dependences only; moreover, is sensitive to experimental errors. An alternative measure, the Shannon Mutual Information (MI), is free from the above mentioned weaknesses. However, the calculation of MI for continuous variables from the finite number of experimental points, N, involves an ambiguity arising when one divides the range of values of the continuous variable into boxes. Then the distribution of experimental points among the boxes (and, therefore, MI) depends on the box size. An algorithm for the calculation of MI for continuous variables is proposed. We find the optimum box sizes for a given N from the condition of minimum entropy variation with respect to the change of the box sizes. We have applied this technique to the gene expression dataset from Stanford, containing microarray data at 18 time points from yeast Saccharomyces cerevisiae cultures (Spellman et al.,). We calculated MI for all of the pairs of time points. The MI analysis allowed us to identify time patterns related to different biological processes in the cell.

Original languageEnglish (US)
Title of host publicationAIP Conference Proceedings
Pages25-30
Number of pages6
Volume854
DOIs
StatePublished - 2006
Externally publishedYes
Event9h Mexican Symposium on Medical Physics - Guadalajara, Jalisco, Mexico
Duration: Mar 18 2006Mar 23 2006

Other

Other9h Mexican Symposium on Medical Physics
CountryMexico
CityGuadalajara, Jalisco
Period3/18/063/23/06

Fingerprint

gene expression
boxes
information analysis
saccharomyces
yeast
correlation coefficients
ambiguity
entropy
cells

Keywords

  • Gene expression
  • Mutual information

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Chumakov, S., Ballesteros, E., Rodriguez Sanchez, J. E., Chavez, A., Zhang, M., Pettitt, B., & Fofanov, Y. (2006). Using mutual information to discover temporal patterns in gene expression data. In AIP Conference Proceedings (Vol. 854, pp. 25-30) https://doi.org/10.1063/1.2356392

Using mutual information to discover temporal patterns in gene expression data. / Chumakov, Sergei; Ballesteros, Efren; Rodriguez Sanchez, Jorge E.; Chavez, Arturo; Zhang, Meizhuo; Pettitt, Bernard; Fofanov, Yuriy.

AIP Conference Proceedings. Vol. 854 2006. p. 25-30.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chumakov, S, Ballesteros, E, Rodriguez Sanchez, JE, Chavez, A, Zhang, M, Pettitt, B & Fofanov, Y 2006, Using mutual information to discover temporal patterns in gene expression data. in AIP Conference Proceedings. vol. 854, pp. 25-30, 9h Mexican Symposium on Medical Physics, Guadalajara, Jalisco, Mexico, 3/18/06. https://doi.org/10.1063/1.2356392
Chumakov S, Ballesteros E, Rodriguez Sanchez JE, Chavez A, Zhang M, Pettitt B et al. Using mutual information to discover temporal patterns in gene expression data. In AIP Conference Proceedings. Vol. 854. 2006. p. 25-30 https://doi.org/10.1063/1.2356392
Chumakov, Sergei ; Ballesteros, Efren ; Rodriguez Sanchez, Jorge E. ; Chavez, Arturo ; Zhang, Meizhuo ; Pettitt, Bernard ; Fofanov, Yuriy. / Using mutual information to discover temporal patterns in gene expression data. AIP Conference Proceedings. Vol. 854 2006. pp. 25-30
@inproceedings{18bbfd3a1b4a455b8369c33f3fe7b05e,
title = "Using mutual information to discover temporal patterns in gene expression data",
abstract = "Finding relations among gene expressions involves the definition of the similarity between experimental data. A simplest similarity measure is the Correlation Coefficient. It is able to identify linear dependences only; moreover, is sensitive to experimental errors. An alternative measure, the Shannon Mutual Information (MI), is free from the above mentioned weaknesses. However, the calculation of MI for continuous variables from the finite number of experimental points, N, involves an ambiguity arising when one divides the range of values of the continuous variable into boxes. Then the distribution of experimental points among the boxes (and, therefore, MI) depends on the box size. An algorithm for the calculation of MI for continuous variables is proposed. We find the optimum box sizes for a given N from the condition of minimum entropy variation with respect to the change of the box sizes. We have applied this technique to the gene expression dataset from Stanford, containing microarray data at 18 time points from yeast Saccharomyces cerevisiae cultures (Spellman et al.,). We calculated MI for all of the pairs of time points. The MI analysis allowed us to identify time patterns related to different biological processes in the cell.",
keywords = "Gene expression, Mutual information",
author = "Sergei Chumakov and Efren Ballesteros and {Rodriguez Sanchez}, {Jorge E.} and Arturo Chavez and Meizhuo Zhang and Bernard Pettitt and Yuriy Fofanov",
year = "2006",
doi = "10.1063/1.2356392",
language = "English (US)",
volume = "854",
pages = "25--30",
booktitle = "AIP Conference Proceedings",

}

TY - GEN

T1 - Using mutual information to discover temporal patterns in gene expression data

AU - Chumakov, Sergei

AU - Ballesteros, Efren

AU - Rodriguez Sanchez, Jorge E.

AU - Chavez, Arturo

AU - Zhang, Meizhuo

AU - Pettitt, Bernard

AU - Fofanov, Yuriy

PY - 2006

Y1 - 2006

N2 - Finding relations among gene expressions involves the definition of the similarity between experimental data. A simplest similarity measure is the Correlation Coefficient. It is able to identify linear dependences only; moreover, is sensitive to experimental errors. An alternative measure, the Shannon Mutual Information (MI), is free from the above mentioned weaknesses. However, the calculation of MI for continuous variables from the finite number of experimental points, N, involves an ambiguity arising when one divides the range of values of the continuous variable into boxes. Then the distribution of experimental points among the boxes (and, therefore, MI) depends on the box size. An algorithm for the calculation of MI for continuous variables is proposed. We find the optimum box sizes for a given N from the condition of minimum entropy variation with respect to the change of the box sizes. We have applied this technique to the gene expression dataset from Stanford, containing microarray data at 18 time points from yeast Saccharomyces cerevisiae cultures (Spellman et al.,). We calculated MI for all of the pairs of time points. The MI analysis allowed us to identify time patterns related to different biological processes in the cell.

AB - Finding relations among gene expressions involves the definition of the similarity between experimental data. A simplest similarity measure is the Correlation Coefficient. It is able to identify linear dependences only; moreover, is sensitive to experimental errors. An alternative measure, the Shannon Mutual Information (MI), is free from the above mentioned weaknesses. However, the calculation of MI for continuous variables from the finite number of experimental points, N, involves an ambiguity arising when one divides the range of values of the continuous variable into boxes. Then the distribution of experimental points among the boxes (and, therefore, MI) depends on the box size. An algorithm for the calculation of MI for continuous variables is proposed. We find the optimum box sizes for a given N from the condition of minimum entropy variation with respect to the change of the box sizes. We have applied this technique to the gene expression dataset from Stanford, containing microarray data at 18 time points from yeast Saccharomyces cerevisiae cultures (Spellman et al.,). We calculated MI for all of the pairs of time points. The MI analysis allowed us to identify time patterns related to different biological processes in the cell.

KW - Gene expression

KW - Mutual information

UR - http://www.scopus.com/inward/record.url?scp=33846527894&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846527894&partnerID=8YFLogxK

U2 - 10.1063/1.2356392

DO - 10.1063/1.2356392

M3 - Conference contribution

VL - 854

SP - 25

EP - 30

BT - AIP Conference Proceedings

ER -