Predicting protein-protein interactions from sequence using correlation coefficient and high-quality interaction dataset

Ming Guang Shi, Jun Feng Xia, Xue Ling Li, De Shuang Huang

Research output: Contribution to journalArticle

62 Scopus citations


Identifying protein-protein interactions (PPIs) is critical for understanding the cellular function of the proteins and the machinery of a proteome. Data of PPIs derived from high-throughput technologies are often incomplete and noisy. Therefore, it is important to develop computational methods and high-quality interaction dataset for predicting PPIs. A sequence-based method is proposed by combining correlation coefficient (CC) transformation and support vector machine (SVM). CC transformation not only adequately considers the neighboring effect of protein sequence but describes the level of CC between two protein sequences. A gold standard positives (interacting) dataset MIPS Core and a gold standard negatives (non-interacting) dataset GO-NEG of yeast Saccharomyces cerevisiae were mined to objectively evaluate the above method and attenuate the bias. The SVM model combined with CC transformation yielded the best performance with a high accuracy of 87.94% using gold standard positives and gold standard negatives datasets. The source code of MATLAB and the datasets are available on request under

Original languageEnglish (US)
Pages (from-to)891-899
Number of pages9
JournalAmino Acids
Issue number3
StatePublished - Mar 2010
Externally publishedYes



  • Correlation coefficient
  • Gold standard negatives dataset
  • Gold standard positives dataset
  • Protein sequence
  • Protein-protein interactions
  • Support vector machine

ASJC Scopus subject areas

  • Biochemistry
  • Clinical Biochemistry
  • Organic Chemistry

Cite this