TY - JOUR
T1 - Identifying property based sequence motifs in protein families and superfamilies
T2 - Application to DNase-1 related endonucleases
AU - Mathura, Venkatarajan S.
AU - Schein, Catherine H.
AU - Braun, Werner
N1 - Funding Information:
This work was supported by the U.S. Department of Energy (DE-FG-00ER63041), a Research Development Grant (#2535-01) of the John Sealy Memorial Endowment Fund, the U.S. Food and Drug Administration (FDA-U-002249-01) and the Advanced Research Program of the Texas Higher Education Coordinating Board. We thank Dr Numan Oezguen and Dr Tadahide Izumi for fruitful discussions and Ms Cynthia Orlea for assistance in preparing the manuscript.
PY - 2003/7/22
Y1 - 2003/7/22
N2 - Motivation: Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise. Results: We have developed a novel automatic method, based on patterns of conservation of 237 physical-chemical properties of amino acids in aligned protein sequences, to find related motifs in proteins with little or no overall sequence similarity. As an application, our web-server MASIA identified 12 property-based motifs in the apurinic/apyrimidinic endonuclease (APE) family of DNA-repair enzymes of the DNase-I superfamily. Searching with these motifs located distantly related representatives of the DNase-I superfamily, such as Inositol 5′-polyphosphate phosphatases in the ASTRAL40 database, using a Bayesian scoring function. Other proteins containing APE motifs had no overall sequence or structural similarity. However, all were phosphatases and/or had a metal ion binding active site. Thus our automated method can identify discrete elements in distantly related proteins that define local structure and aspects of function. We anticipate that our method will complement existing ones to functionally annotate novel protein sequences from genomic projects.
AB - Motivation: Identification of short conserved sequence motifs common to a protein family or superfamily can be more useful than overall sequence similarity in suggesting the function of novel gene products. Locating motifs still requires expert knowledge, as automated methods using stringent criteria may not differentiate subtle similarities from statistical noise. Results: We have developed a novel automatic method, based on patterns of conservation of 237 physical-chemical properties of amino acids in aligned protein sequences, to find related motifs in proteins with little or no overall sequence similarity. As an application, our web-server MASIA identified 12 property-based motifs in the apurinic/apyrimidinic endonuclease (APE) family of DNA-repair enzymes of the DNase-I superfamily. Searching with these motifs located distantly related representatives of the DNase-I superfamily, such as Inositol 5′-polyphosphate phosphatases in the ASTRAL40 database, using a Bayesian scoring function. Other proteins containing APE motifs had no overall sequence or structural similarity. However, all were phosphatases and/or had a metal ion binding active site. Thus our automated method can identify discrete elements in distantly related proteins that define local structure and aspects of function. We anticipate that our method will complement existing ones to functionally annotate novel protein sequences from genomic projects.
UR - http://www.scopus.com/inward/record.url?scp=0042093623&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0042093623&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btg164
DO - 10.1093/bioinformatics/btg164
M3 - Article
C2 - 12874050
AN - SCOPUS:0042093623
SN - 1367-4803
VL - 19
SP - 1381
EP - 1390
JO - Bioinformatics
JF - Bioinformatics
IS - 11
ER -