TY - JOUR
T1 - A novel estimator of the interaction matrix in Graphical Gaussian Model of omics data using the entropy of non-equilibrium systems
AU - Borzou, Ahmad
AU - Sadygov, Rovshan G.
N1 - Publisher Copyright:
© 2020 The Author(s) 2020. Published by Oxford University Press. All rights reserved.
PY - 2021/3/15
Y1 - 2021/3/15
N2 - Motivation: Inferring the direct relationships between biomolecules from omics datasets is essential for the understanding of biological and disease mechanisms. Gaussian Graphical Model (GGM) provides a fairly simple and accurate representation of these interactions. However, estimation of the associated interaction matrix using data is challenging due to a high number of measured molecules and a low number of samples. Results: In this article, we use the thermodynamic entropy of the non-equilibrium system of molecules and the data-driven constraints among their expressions to derive an analytic formula for the interaction matrix of Gaussian models. Through a data simulation, we show that our method returns an improved estimation of the interaction matrix. Also, using the developed method, we estimate the interaction matrix associated with plasma proteome and construct the corresponding GGM and show that known NAFLD-related proteins like ADIPOQ, APOC, APOE, DPP4, CAT, GC, HP, CETP, SERPINA1, COLA1, PIGR, IGHD, SAA1 and FCGBP are among the top 15% most interacting proteins of the dataset.
AB - Motivation: Inferring the direct relationships between biomolecules from omics datasets is essential for the understanding of biological and disease mechanisms. Gaussian Graphical Model (GGM) provides a fairly simple and accurate representation of these interactions. However, estimation of the associated interaction matrix using data is challenging due to a high number of measured molecules and a low number of samples. Results: In this article, we use the thermodynamic entropy of the non-equilibrium system of molecules and the data-driven constraints among their expressions to derive an analytic formula for the interaction matrix of Gaussian models. Through a data simulation, we show that our method returns an improved estimation of the interaction matrix. Also, using the developed method, we estimate the interaction matrix associated with plasma proteome and construct the corresponding GGM and show that known NAFLD-related proteins like ADIPOQ, APOC, APOE, DPP4, CAT, GC, HP, CETP, SERPINA1, COLA1, PIGR, IGHD, SAA1 and FCGBP are among the top 15% most interacting proteins of the dataset.
UR - http://www.scopus.com/inward/record.url?scp=85106069678&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85106069678&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btaa894
DO - 10.1093/bioinformatics/btaa894
M3 - Article
C2 - 33067612
AN - SCOPUS:85106069678
SN - 1367-4803
VL - 37
SP - 837
EP - 844
JO - Bioinformatics
JF - Bioinformatics
IS - 6
ER -