### Abstract

MOTIVATION: High throughput technologies are widely employed in modern biomedical research. They yield measurements of a large number of biomolecules in a single experiment. The number of experiments usually is much smaller than the number of measurements in each experiment. The simultaneous measurements of biomolecules provide a basis for a comprehensive, systems view for describing relevant biological processes. Often it is necessary to determine correlations between the data matrices under different conditions or pathways. However, the techniques for analyzing the data with a low number of samples for possible correlations within or between conditions are still in development. Earlier developed correlative measures, such as the RV coefficient, use the trace of the product of data matrices as the most relevant characteristic. However, a recent study has shown that the RV coefficient consistently overestimates the correlations in the case of low sample numbers. To correct for this bias, it was suggested to discard the diagonal elements of the outer products of each data matrix. In this work, a principled approach based on the matrix decomposition generates three trace-independent parts for every matrix. These components are unique, and they are used to determine different aspects of correlations between the original datasets. RESULTS: Simulations show that the decomposition results in the removal of high correlation bias and the dependence on the sample number intrinsic to the RV coefficient. We then use the correlations to analyze a real proteomics dataset. AVAILABILITY AND IMPLEMENTATION: The python code can be downloaded from http://dynamic-proteome.utmb.edu/MatrixCorrelations.aspx. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Original language | English (US) |
---|---|

Pages (from-to) | 4748-4753 |

Number of pages | 6 |

Journal | Bioinformatics (Oxford, England) |

Volume | 35 |

Issue number | 22 |

DOIs | |

State | Published - Nov 1 2019 |

### Fingerprint

### ASJC Scopus subject areas

- Statistics and Probability
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics

### Cite this

*Bioinformatics (Oxford, England)*,

*35*(22), 4748-4753. https://doi.org/10.1093/bioinformatics/btz281

**Another look at matrix correlations.** / Borzou, Ahmad; Yousefi, Razie; Sadygov, Rovshan G.

Research output: Contribution to journal › Article

*Bioinformatics (Oxford, England)*, vol. 35, no. 22, pp. 4748-4753. https://doi.org/10.1093/bioinformatics/btz281

}

TY - JOUR

T1 - Another look at matrix correlations

AU - Borzou, Ahmad

AU - Yousefi, Razie

AU - Sadygov, Rovshan G.

PY - 2019/11/1

Y1 - 2019/11/1

N2 - MOTIVATION: High throughput technologies are widely employed in modern biomedical research. They yield measurements of a large number of biomolecules in a single experiment. The number of experiments usually is much smaller than the number of measurements in each experiment. The simultaneous measurements of biomolecules provide a basis for a comprehensive, systems view for describing relevant biological processes. Often it is necessary to determine correlations between the data matrices under different conditions or pathways. However, the techniques for analyzing the data with a low number of samples for possible correlations within or between conditions are still in development. Earlier developed correlative measures, such as the RV coefficient, use the trace of the product of data matrices as the most relevant characteristic. However, a recent study has shown that the RV coefficient consistently overestimates the correlations in the case of low sample numbers. To correct for this bias, it was suggested to discard the diagonal elements of the outer products of each data matrix. In this work, a principled approach based on the matrix decomposition generates three trace-independent parts for every matrix. These components are unique, and they are used to determine different aspects of correlations between the original datasets. RESULTS: Simulations show that the decomposition results in the removal of high correlation bias and the dependence on the sample number intrinsic to the RV coefficient. We then use the correlations to analyze a real proteomics dataset. AVAILABILITY AND IMPLEMENTATION: The python code can be downloaded from http://dynamic-proteome.utmb.edu/MatrixCorrelations.aspx. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

AB - MOTIVATION: High throughput technologies are widely employed in modern biomedical research. They yield measurements of a large number of biomolecules in a single experiment. The number of experiments usually is much smaller than the number of measurements in each experiment. The simultaneous measurements of biomolecules provide a basis for a comprehensive, systems view for describing relevant biological processes. Often it is necessary to determine correlations between the data matrices under different conditions or pathways. However, the techniques for analyzing the data with a low number of samples for possible correlations within or between conditions are still in development. Earlier developed correlative measures, such as the RV coefficient, use the trace of the product of data matrices as the most relevant characteristic. However, a recent study has shown that the RV coefficient consistently overestimates the correlations in the case of low sample numbers. To correct for this bias, it was suggested to discard the diagonal elements of the outer products of each data matrix. In this work, a principled approach based on the matrix decomposition generates three trace-independent parts for every matrix. These components are unique, and they are used to determine different aspects of correlations between the original datasets. RESULTS: Simulations show that the decomposition results in the removal of high correlation bias and the dependence on the sample number intrinsic to the RV coefficient. We then use the correlations to analyze a real proteomics dataset. AVAILABILITY AND IMPLEMENTATION: The python code can be downloaded from http://dynamic-proteome.utmb.edu/MatrixCorrelations.aspx. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85074963755&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85074963755&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btz281

DO - 10.1093/bioinformatics/btz281

M3 - Article

C2 - 31081021

AN - SCOPUS:85074963755

VL - 35

SP - 4748

EP - 4753

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 22

ER -