MOTIVATION: High throughput technologies are widely employed in modern biomedical research. They yield measurements of a large number of biomolecules in a single experiment. The number of experiments usually is much smaller than the number of measurements in each experiment. The simultaneous measurements of biomolecules provide a basis for a comprehensive, systems view for describing relevant biological processes. Often it is necessary to determine correlations between the data matrices under different conditions or pathways. However, the techniques for analyzing the data with a low number of samples for possible correlations within or between conditions are still in development. Earlier developed correlative measures, such as the RV coefficient, use the trace of the product of data matrices as the most relevant characteristic. However, a recent study has shown that the RV coefficient consistently overestimates the correlations in the case of low sample numbers. To correct for this bias, it was suggested to discard the diagonal elements of the outer products of each data matrix. In this work, a principled approach based on the matrix decomposition generates three trace-independent parts for every matrix. These components are unique, and they are used to determine different aspects of correlations between the original datasets. RESULTS: Simulations show that the decomposition results in the removal of high correlation bias and the dependence on the sample number intrinsic to the RV coefficient. We then use the correlations to analyze a real proteomics dataset. AVAILABILITY AND IMPLEMENTATION: The python code can be downloaded from http://dynamic-proteome.utmb.edu/MatrixCorrelations.aspx. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics