Another look at matrix correlations

Research output: Contribution to journalArticle

Abstract

MOTIVATION: High throughput technologies are widely employed in modern biomedical research. They yield measurements of a large number of biomolecules in a single experiment. The number of experiments usually is much smaller than the number of measurements in each experiment. The simultaneous measurements of biomolecules provide a basis for a comprehensive, systems view for describing relevant biological processes. Often it is necessary to determine correlations between the data matrices under different conditions or pathways. However, the techniques for analyzing the data with a low number of samples for possible correlations within or between conditions are still in development. Earlier developed correlative measures, such as the RV coefficient, use the trace of the product of data matrices as the most relevant characteristic. However, a recent study has shown that the RV coefficient consistently overestimates the correlations in the case of low sample numbers. To correct for this bias, it was suggested to discard the diagonal elements of the outer products of each data matrix. In this work, a principled approach based on the matrix decomposition generates three trace-independent parts for every matrix. These components are unique, and they are used to determine different aspects of correlations between the original datasets. RESULTS: Simulations show that the decomposition results in the removal of high correlation bias and the dependence on the sample number intrinsic to the RV coefficient. We then use the correlations to analyze a real proteomics dataset. AVAILABILITY AND IMPLEMENTATION: The python code can be downloaded from http://dynamic-proteome.utmb.edu/MatrixCorrelations.aspx. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Original languageEnglish (US)
Pages (from-to)4748-4753
Number of pages6
JournalBioinformatics (Oxford, England)
Volume35
Issue number22
DOIs
StatePublished - Nov 1 2019

Fingerprint

Correlation Matrix
Biomolecules
Boidae
Biological Phenomena
Proteome
Computational Biology
Proteomics
Decomposition
Biomedical Research
Experiments
Bioinformatics
Technology
Coefficient
Trace
Throughput
Availability
Proteins
Experiment
Matrix Decomposition
Python

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Another look at matrix correlations. / Borzou, Ahmad; Yousefi, Razie; Sadygov, Rovshan G.

In: Bioinformatics (Oxford, England), Vol. 35, No. 22, 01.11.2019, p. 4748-4753.

Research output: Contribution to journalArticle

@article{8f5aad814d7a4574960cfc85d27d45e9,
title = "Another look at matrix correlations",
abstract = "MOTIVATION: High throughput technologies are widely employed in modern biomedical research. They yield measurements of a large number of biomolecules in a single experiment. The number of experiments usually is much smaller than the number of measurements in each experiment. The simultaneous measurements of biomolecules provide a basis for a comprehensive, systems view for describing relevant biological processes. Often it is necessary to determine correlations between the data matrices under different conditions or pathways. However, the techniques for analyzing the data with a low number of samples for possible correlations within or between conditions are still in development. Earlier developed correlative measures, such as the RV coefficient, use the trace of the product of data matrices as the most relevant characteristic. However, a recent study has shown that the RV coefficient consistently overestimates the correlations in the case of low sample numbers. To correct for this bias, it was suggested to discard the diagonal elements of the outer products of each data matrix. In this work, a principled approach based on the matrix decomposition generates three trace-independent parts for every matrix. These components are unique, and they are used to determine different aspects of correlations between the original datasets. RESULTS: Simulations show that the decomposition results in the removal of high correlation bias and the dependence on the sample number intrinsic to the RV coefficient. We then use the correlations to analyze a real proteomics dataset. AVAILABILITY AND IMPLEMENTATION: The python code can be downloaded from http://dynamic-proteome.utmb.edu/MatrixCorrelations.aspx. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.",
author = "Ahmad Borzou and Razie Yousefi and Sadygov, {Rovshan G.}",
year = "2019",
month = "11",
day = "1",
doi = "10.1093/bioinformatics/btz281",
language = "English (US)",
volume = "35",
pages = "4748--4753",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "22",

}

TY - JOUR

T1 - Another look at matrix correlations

AU - Borzou, Ahmad

AU - Yousefi, Razie

AU - Sadygov, Rovshan G.

PY - 2019/11/1

Y1 - 2019/11/1

N2 - MOTIVATION: High throughput technologies are widely employed in modern biomedical research. They yield measurements of a large number of biomolecules in a single experiment. The number of experiments usually is much smaller than the number of measurements in each experiment. The simultaneous measurements of biomolecules provide a basis for a comprehensive, systems view for describing relevant biological processes. Often it is necessary to determine correlations between the data matrices under different conditions or pathways. However, the techniques for analyzing the data with a low number of samples for possible correlations within or between conditions are still in development. Earlier developed correlative measures, such as the RV coefficient, use the trace of the product of data matrices as the most relevant characteristic. However, a recent study has shown that the RV coefficient consistently overestimates the correlations in the case of low sample numbers. To correct for this bias, it was suggested to discard the diagonal elements of the outer products of each data matrix. In this work, a principled approach based on the matrix decomposition generates three trace-independent parts for every matrix. These components are unique, and they are used to determine different aspects of correlations between the original datasets. RESULTS: Simulations show that the decomposition results in the removal of high correlation bias and the dependence on the sample number intrinsic to the RV coefficient. We then use the correlations to analyze a real proteomics dataset. AVAILABILITY AND IMPLEMENTATION: The python code can be downloaded from http://dynamic-proteome.utmb.edu/MatrixCorrelations.aspx. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

AB - MOTIVATION: High throughput technologies are widely employed in modern biomedical research. They yield measurements of a large number of biomolecules in a single experiment. The number of experiments usually is much smaller than the number of measurements in each experiment. The simultaneous measurements of biomolecules provide a basis for a comprehensive, systems view for describing relevant biological processes. Often it is necessary to determine correlations between the data matrices under different conditions or pathways. However, the techniques for analyzing the data with a low number of samples for possible correlations within or between conditions are still in development. Earlier developed correlative measures, such as the RV coefficient, use the trace of the product of data matrices as the most relevant characteristic. However, a recent study has shown that the RV coefficient consistently overestimates the correlations in the case of low sample numbers. To correct for this bias, it was suggested to discard the diagonal elements of the outer products of each data matrix. In this work, a principled approach based on the matrix decomposition generates three trace-independent parts for every matrix. These components are unique, and they are used to determine different aspects of correlations between the original datasets. RESULTS: Simulations show that the decomposition results in the removal of high correlation bias and the dependence on the sample number intrinsic to the RV coefficient. We then use the correlations to analyze a real proteomics dataset. AVAILABILITY AND IMPLEMENTATION: The python code can be downloaded from http://dynamic-proteome.utmb.edu/MatrixCorrelations.aspx. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85074963755&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074963755&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btz281

DO - 10.1093/bioinformatics/btz281

M3 - Article

C2 - 31081021

AN - SCOPUS:85074963755

VL - 35

SP - 4748

EP - 4753

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 22

ER -