Identifying property based sequence motifs in protein families and superfamilies

Application to DNase-1 related endonucleases

Venkatarajan S. Mathura, Catherine H. Schein, Werner Braun

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

Motivation: Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise. Results: We have developed a novel automatic method, based on patterns of conservation of 237 physical-chemical properties of amino acids in aligned protein sequences, to find related motifs in proteins with little or no overall sequence similarity. As an application, our web-server MASIA identified 12 property-based motifs in the apurinic/apyrimidinic endonuclease (APE) family of DNA-repair enzymes of the DNase-I superfamily. Searching with these motifs located distantly related representatives of the DNase-I superfamily, such as Inositol 5′-polyphosphate phosphatases in the ASTRAL40 database, using a Bayesian scoring function. Other proteins containing APE motifs had no overall sequence or structural similarity. However, all were phosphatases and/or had a metal ion binding active site. Thus our automated method can identify discrete elements in distantly related proteins that define local structure and aspects of function. We anticipate that our method will complement existing ones to functionally annotate novel protein sequences from genomic projects.

Original languageEnglish (US)
Pages (from-to)1381-1390
Number of pages10
JournalBioinformatics
Volume19
Issue number11
DOIs
StatePublished - Jul 22 2003

Fingerprint

Amino Acid Motifs
Deoxyribonucleases
Endonucleases
Proteins
Protein
Protein Sequence
Deoxyribonuclease I
Phosphatases
Phosphoric Monoester Hydrolases
Discrete Elements
Structural Similarity
DNA-(Apurinic or Apyrimidinic Site) Lyase
DNA Repair Enzymes
Web Server
Local Structure
Differentiate
Scoring
Conserved Sequence
Repair
Genomics

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

Identifying property based sequence motifs in protein families and superfamilies : Application to DNase-1 related endonucleases. / Mathura, Venkatarajan S.; Schein, Catherine H.; Braun, Werner.

In: Bioinformatics, Vol. 19, No. 11, 22.07.2003, p. 1381-1390.

Research output: Contribution to journalArticle

@article{82f90982f83f4597987161b2f6474a0a,
title = "Identifying property based sequence motifs in protein families and superfamilies: Application to DNase-1 related endonucleases",
abstract = "Motivation: Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise. Results: We have developed a novel automatic method, based on patterns of conservation of 237 physical-chemical properties of amino acids in aligned protein sequences, to find related motifs in proteins with little or no overall sequence similarity. As an application, our web-server MASIA identified 12 property-based motifs in the apurinic/apyrimidinic endonuclease (APE) family of DNA-repair enzymes of the DNase-I superfamily. Searching with these motifs located distantly related representatives of the DNase-I superfamily, such as Inositol 5′-polyphosphate phosphatases in the ASTRAL40 database, using a Bayesian scoring function. Other proteins containing APE motifs had no overall sequence or structural similarity. However, all were phosphatases and/or had a metal ion binding active site. Thus our automated method can identify discrete elements in distantly related proteins that define local structure and aspects of function. We anticipate that our method will complement existing ones to functionally annotate novel protein sequences from genomic projects.",
author = "Mathura, {Venkatarajan S.} and Schein, {Catherine H.} and Werner Braun",
year = "2003",
month = "7",
day = "22",
doi = "10.1093/bioinformatics/btg164",
language = "English (US)",
volume = "19",
pages = "1381--1390",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "11",

}

TY - JOUR

T1 - Identifying property based sequence motifs in protein families and superfamilies

T2 - Application to DNase-1 related endonucleases

AU - Mathura, Venkatarajan S.

AU - Schein, Catherine H.

AU - Braun, Werner

PY - 2003/7/22

Y1 - 2003/7/22

N2 - Motivation: Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise. Results: We have developed a novel automatic method, based on patterns of conservation of 237 physical-chemical properties of amino acids in aligned protein sequences, to find related motifs in proteins with little or no overall sequence similarity. As an application, our web-server MASIA identified 12 property-based motifs in the apurinic/apyrimidinic endonuclease (APE) family of DNA-repair enzymes of the DNase-I superfamily. Searching with these motifs located distantly related representatives of the DNase-I superfamily, such as Inositol 5′-polyphosphate phosphatases in the ASTRAL40 database, using a Bayesian scoring function. Other proteins containing APE motifs had no overall sequence or structural similarity. However, all were phosphatases and/or had a metal ion binding active site. Thus our automated method can identify discrete elements in distantly related proteins that define local structure and aspects of function. We anticipate that our method will complement existing ones to functionally annotate novel protein sequences from genomic projects.

AB - Motivation: Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise. Results: We have developed a novel automatic method, based on patterns of conservation of 237 physical-chemical properties of amino acids in aligned protein sequences, to find related motifs in proteins with little or no overall sequence similarity. As an application, our web-server MASIA identified 12 property-based motifs in the apurinic/apyrimidinic endonuclease (APE) family of DNA-repair enzymes of the DNase-I superfamily. Searching with these motifs located distantly related representatives of the DNase-I superfamily, such as Inositol 5′-polyphosphate phosphatases in the ASTRAL40 database, using a Bayesian scoring function. Other proteins containing APE motifs had no overall sequence or structural similarity. However, all were phosphatases and/or had a metal ion binding active site. Thus our automated method can identify discrete elements in distantly related proteins that define local structure and aspects of function. We anticipate that our method will complement existing ones to functionally annotate novel protein sequences from genomic projects.

UR - http://www.scopus.com/inward/record.url?scp=0042093623&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0042093623&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btg164

DO - 10.1093/bioinformatics/btg164

M3 - Article

VL - 19

SP - 1381

EP - 1390

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 11

ER -