TY - JOUR
T1 - DGraph Clusters Flaviviruses and β-Coronaviruses According to Their Hosts, Disease Type, and Human Cell Receptors
AU - Braun, Benjamin A.
AU - Schein, Catherine H.
AU - Braun, Werner
N1 - Funding Information:
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by grants from the National Institutes of Health (grant nos. AI137332 to W.B., AI064913 to W.B./C.H.S., and AI105985 to C.H.S.). The computational resources of the Sealy Center for Structural Biology and Molecular Biophysics were also used in this project.
Publisher Copyright:
© The Author(s) 2021.
PY - 2021
Y1 - 2021
N2 - Motivation: There is a need for rapid and easy-to-use, alignment-free methods to cluster large groups of protein sequence data. Commonly used phylogenetic trees based on alignments can be used to visualize only a limited number of protein sequences. DGraph, introduced here, is an application developed to generate 2-dimensional (2D) maps based on similarity scores for sequences. The program automatically calculates and graphically displays property distance (PD) scores based on physico-chemical property (PCP) similarities from an unaligned list of FASTA files. Such “PD-graphs” show the interrelatedness of the sequences, whereby clusters can reveal deeper connectivities. Results: Property distance graphs generated for flavivirus (FV), enterovirus (EV), and coronavirus (CoV) sequences from complete polyproteins or individual proteins are consistent with biological data on vector types, hosts, cellular receptors, and disease phenotypes. Property distance graphs separate the tick- from the mosquito-borne FV, cluster viruses that infect bats, camels, seabirds, and humans separately. The clusters correlate with disease phenotype. The PD method segregates the β-CoV spike proteins of severe acute respiratory syndrome (SARS), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and Middle East respiratory syndrome (MERS) sequences from other human pathogenic CoV, with clustering consistent with cellular receptor usage. The graphs also suggest evolutionary relationships that may be difficult to determine with conventional bootstrapping methods that require postulating an ancestral sequence.
AB - Motivation: There is a need for rapid and easy-to-use, alignment-free methods to cluster large groups of protein sequence data. Commonly used phylogenetic trees based on alignments can be used to visualize only a limited number of protein sequences. DGraph, introduced here, is an application developed to generate 2-dimensional (2D) maps based on similarity scores for sequences. The program automatically calculates and graphically displays property distance (PD) scores based on physico-chemical property (PCP) similarities from an unaligned list of FASTA files. Such “PD-graphs” show the interrelatedness of the sequences, whereby clusters can reveal deeper connectivities. Results: Property distance graphs generated for flavivirus (FV), enterovirus (EV), and coronavirus (CoV) sequences from complete polyproteins or individual proteins are consistent with biological data on vector types, hosts, cellular receptors, and disease phenotypes. Property distance graphs separate the tick- from the mosquito-borne FV, cluster viruses that infect bats, camels, seabirds, and humans separately. The clusters correlate with disease phenotype. The PD method segregates the β-CoV spike proteins of severe acute respiratory syndrome (SARS), severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and Middle East respiratory syndrome (MERS) sequences from other human pathogenic CoV, with clustering consistent with cellular receptor usage. The graphs also suggest evolutionary relationships that may be difficult to determine with conventional bootstrapping methods that require postulating an ancestral sequence.
KW - Alignment free clustering
KW - Enterovirus classification
KW - Physical-chemical property (PCP) consensus
KW - Property distances (PD) of viral sequences
KW - SARS origin
UR - http://www.scopus.com/inward/record.url?scp=85107540453&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85107540453&partnerID=8YFLogxK
U2 - 10.1177/11779322211020316
DO - 10.1177/11779322211020316
M3 - Article
AN - SCOPUS:85107540453
SN - 1177-9322
VL - 15
JO - Bioinformatics and Biology Insights
JF - Bioinformatics and Biology Insights
ER -