TreeHugger: A new test for enrichment of gene ontology terms

Daniel Jupiter, Jessica Şahutoǧlu, Vincent VanBuren

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

The Gene Ontology (GO) project provides a structured vocabulary of biological terms used by biological researchers as a tool for standardization of references to biological entities. Genes may be annotated with GO terms to indicate their roles or localizations in the cell. GO has been used in conjunction with high-throughput experimental methods, such as microarrays. In this setting, the interest is to determine whether sets of genes identified by the high-throughput experiment are enriched for GO terms: Do certain terms annotate more genes in the identified set than one might expect? Enriched terms are taken as a potential summary of the cellular function for the identified set of genes and may provide clues leading to new directions for investigation. Current methods for determining whether sets of genes are GO-enriched have certain well-known shortcomings. Many methods do not take the hierarchical structure of the ontology into account in determining enrichment. We address this drawback by introducing a new statistical test (TreeHugger) based on a novel per-gene scoring scheme for GO terms. Given a set of genes and a specified subset of those genes, our method determines enrichment of GO terms in the subset, taking into account the structure of the ontology and ascribing a lower weight to those terms that do not themselves directly annotate the given genes. Tests on simulated and real data indicate that our method is a conservative test for enrichment. Testing TreeHugger on a biological example reveals that it also reduces the redundancy caused by giving high scores to indirect annotations as provided by standard enrichment tests.

Original languageEnglish (US)
Pages (from-to)210-221
Number of pages12
JournalINFORMS Journal on Computing
Volume22
Issue number2
DOIs
StatePublished - Mar 2010
Externally publishedYes

Fingerprint

Ontology
Genes
Gene
Throughput
Statistical tests
Microarrays
Standardization
Redundancy

Keywords

  • Data analysis
  • Genomics
  • Microarray
  • Probability
  • Statistics

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Computer Science Applications
  • Management Science and Operations Research

Cite this

TreeHugger : A new test for enrichment of gene ontology terms. / Jupiter, Daniel; Şahutoǧlu, Jessica; VanBuren, Vincent.

In: INFORMS Journal on Computing, Vol. 22, No. 2, 03.2010, p. 210-221.

Research output: Contribution to journalArticle

Jupiter, Daniel ; Şahutoǧlu, Jessica ; VanBuren, Vincent. / TreeHugger : A new test for enrichment of gene ontology terms. In: INFORMS Journal on Computing. 2010 ; Vol. 22, No. 2. pp. 210-221.
@article{81ee44f9450b4d5a87772846dc828a4c,
title = "TreeHugger: A new test for enrichment of gene ontology terms",
abstract = "The Gene Ontology (GO) project provides a structured vocabulary of biological terms used by biological researchers as a tool for standardization of references to biological entities. Genes may be annotated with GO terms to indicate their roles or localizations in the cell. GO has been used in conjunction with high-throughput experimental methods, such as microarrays. In this setting, the interest is to determine whether sets of genes identified by the high-throughput experiment are enriched for GO terms: Do certain terms annotate more genes in the identified set than one might expect? Enriched terms are taken as a potential summary of the cellular function for the identified set of genes and may provide clues leading to new directions for investigation. Current methods for determining whether sets of genes are GO-enriched have certain well-known shortcomings. Many methods do not take the hierarchical structure of the ontology into account in determining enrichment. We address this drawback by introducing a new statistical test (TreeHugger) based on a novel per-gene scoring scheme for GO terms. Given a set of genes and a specified subset of those genes, our method determines enrichment of GO terms in the subset, taking into account the structure of the ontology and ascribing a lower weight to those terms that do not themselves directly annotate the given genes. Tests on simulated and real data indicate that our method is a conservative test for enrichment. Testing TreeHugger on a biological example reveals that it also reduces the redundancy caused by giving high scores to indirect annotations as provided by standard enrichment tests.",
keywords = "Data analysis, Genomics, Microarray, Probability, Statistics",
author = "Daniel Jupiter and Jessica Şahutoǧlu and Vincent VanBuren",
year = "2010",
month = "3",
doi = "10.1287/ijoc.1090.0356",
language = "English (US)",
volume = "22",
pages = "210--221",
journal = "INFORMS Journal on Computing",
issn = "1091-9856",
publisher = "INFORMS Inst.for Operations Res.and the Management Sciences",
number = "2",

}

TY - JOUR

T1 - TreeHugger

T2 - A new test for enrichment of gene ontology terms

AU - Jupiter, Daniel

AU - Şahutoǧlu, Jessica

AU - VanBuren, Vincent

PY - 2010/3

Y1 - 2010/3

N2 - The Gene Ontology (GO) project provides a structured vocabulary of biological terms used by biological researchers as a tool for standardization of references to biological entities. Genes may be annotated with GO terms to indicate their roles or localizations in the cell. GO has been used in conjunction with high-throughput experimental methods, such as microarrays. In this setting, the interest is to determine whether sets of genes identified by the high-throughput experiment are enriched for GO terms: Do certain terms annotate more genes in the identified set than one might expect? Enriched terms are taken as a potential summary of the cellular function for the identified set of genes and may provide clues leading to new directions for investigation. Current methods for determining whether sets of genes are GO-enriched have certain well-known shortcomings. Many methods do not take the hierarchical structure of the ontology into account in determining enrichment. We address this drawback by introducing a new statistical test (TreeHugger) based on a novel per-gene scoring scheme for GO terms. Given a set of genes and a specified subset of those genes, our method determines enrichment of GO terms in the subset, taking into account the structure of the ontology and ascribing a lower weight to those terms that do not themselves directly annotate the given genes. Tests on simulated and real data indicate that our method is a conservative test for enrichment. Testing TreeHugger on a biological example reveals that it also reduces the redundancy caused by giving high scores to indirect annotations as provided by standard enrichment tests.

AB - The Gene Ontology (GO) project provides a structured vocabulary of biological terms used by biological researchers as a tool for standardization of references to biological entities. Genes may be annotated with GO terms to indicate their roles or localizations in the cell. GO has been used in conjunction with high-throughput experimental methods, such as microarrays. In this setting, the interest is to determine whether sets of genes identified by the high-throughput experiment are enriched for GO terms: Do certain terms annotate more genes in the identified set than one might expect? Enriched terms are taken as a potential summary of the cellular function for the identified set of genes and may provide clues leading to new directions for investigation. Current methods for determining whether sets of genes are GO-enriched have certain well-known shortcomings. Many methods do not take the hierarchical structure of the ontology into account in determining enrichment. We address this drawback by introducing a new statistical test (TreeHugger) based on a novel per-gene scoring scheme for GO terms. Given a set of genes and a specified subset of those genes, our method determines enrichment of GO terms in the subset, taking into account the structure of the ontology and ascribing a lower weight to those terms that do not themselves directly annotate the given genes. Tests on simulated and real data indicate that our method is a conservative test for enrichment. Testing TreeHugger on a biological example reveals that it also reduces the redundancy caused by giving high scores to indirect annotations as provided by standard enrichment tests.

KW - Data analysis

KW - Genomics

KW - Microarray

KW - Probability

KW - Statistics

UR - http://www.scopus.com/inward/record.url?scp=77952049609&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952049609&partnerID=8YFLogxK

U2 - 10.1287/ijoc.1090.0356

DO - 10.1287/ijoc.1090.0356

M3 - Article

AN - SCOPUS:77952049609

VL - 22

SP - 210

EP - 221

JO - INFORMS Journal on Computing

JF - INFORMS Journal on Computing

SN - 1091-9856

IS - 2

ER -