Statistical approaches to candidate biomarker panel selection

Heidi Spratt, Hyunsu Ju

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see Fig. 22.1, Chap. 19). Initially, data visualization (Sect. 22.1, below) is important to determine outliers and to get a feel for the nature of the data and whether there appear to be any differences among the groups being examined. From there, the data must be pre-processed (Sect. 22.2) so that outliers are handled, missing values are dealt with, and normality is assessed. Once the processed data has been cleaned and is ready for downstream analysis, hypothesis tests (Sect. 22.3) are performed, and proteins that are differentially expressed are identified. Since the number of differentially expressed proteins is usually larger than warrants further investigation (50+ proteins versus just a handful that will be considered for a biomarker panel), some sort of feature reduction (Sect. 22.4) should be performed to narrow the list of candidate biomarkers down to a more reasonable number. Once the list of proteins has been reduced to those that are likely most useful for downstream classification purposes, unsupervised or supervised learning is performed (Sects. 22.5 and 22.6, respectively).

Original languageEnglish (US)
Title of host publicationAdvances in Experimental Medicine and Biology
PublisherSpringer New York LLC
Pages463-492
Number of pages30
Volume919
DOIs
StatePublished - 2016

Publication series

NameAdvances in Experimental Medicine and Biology
Volume919
ISSN (Print)00652598
ISSN (Electronic)22148019

Fingerprint

Biomarkers
Proteins
Unsupervised learning
Data visualization
Supervised learning
Statistical methods
Pipelines
Learning

Keywords

  • Candidate biomarker selection
  • Data clustering
  • Data consistency
  • Data inspection
  • Data normalization
  • Data transformations
  • Machine learning
  • Outlier detection

ASJC Scopus subject areas

  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Spratt, H., & Ju, H. (2016). Statistical approaches to candidate biomarker panel selection. In Advances in Experimental Medicine and Biology (Vol. 919, pp. 463-492). (Advances in Experimental Medicine and Biology; Vol. 919). Springer New York LLC. https://doi.org/10.1007/978-3-319-41448-5_22

Statistical approaches to candidate biomarker panel selection. / Spratt, Heidi; Ju, Hyunsu.

Advances in Experimental Medicine and Biology. Vol. 919 Springer New York LLC, 2016. p. 463-492 (Advances in Experimental Medicine and Biology; Vol. 919).

Research output: Chapter in Book/Report/Conference proceedingChapter

Spratt, H & Ju, H 2016, Statistical approaches to candidate biomarker panel selection. in Advances in Experimental Medicine and Biology. vol. 919, Advances in Experimental Medicine and Biology, vol. 919, Springer New York LLC, pp. 463-492. https://doi.org/10.1007/978-3-319-41448-5_22
Spratt H, Ju H. Statistical approaches to candidate biomarker panel selection. In Advances in Experimental Medicine and Biology. Vol. 919. Springer New York LLC. 2016. p. 463-492. (Advances in Experimental Medicine and Biology). https://doi.org/10.1007/978-3-319-41448-5_22
Spratt, Heidi ; Ju, Hyunsu. / Statistical approaches to candidate biomarker panel selection. Advances in Experimental Medicine and Biology. Vol. 919 Springer New York LLC, 2016. pp. 463-492 (Advances in Experimental Medicine and Biology).
@inbook{f85c1acaeaab4f8e9259f49a76f3c078,
title = "Statistical approaches to candidate biomarker panel selection",
abstract = "The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see Fig. 22.1, Chap. 19). Initially, data visualization (Sect. 22.1, below) is important to determine outliers and to get a feel for the nature of the data and whether there appear to be any differences among the groups being examined. From there, the data must be pre-processed (Sect. 22.2) so that outliers are handled, missing values are dealt with, and normality is assessed. Once the processed data has been cleaned and is ready for downstream analysis, hypothesis tests (Sect. 22.3) are performed, and proteins that are differentially expressed are identified. Since the number of differentially expressed proteins is usually larger than warrants further investigation (50+ proteins versus just a handful that will be considered for a biomarker panel), some sort of feature reduction (Sect. 22.4) should be performed to narrow the list of candidate biomarkers down to a more reasonable number. Once the list of proteins has been reduced to those that are likely most useful for downstream classification purposes, unsupervised or supervised learning is performed (Sects. 22.5 and 22.6, respectively).",
keywords = "Candidate biomarker selection, Data clustering, Data consistency, Data inspection, Data normalization, Data transformations, Machine learning, Outlier detection",
author = "Heidi Spratt and Hyunsu Ju",
year = "2016",
doi = "10.1007/978-3-319-41448-5_22",
language = "English (US)",
volume = "919",
series = "Advances in Experimental Medicine and Biology",
publisher = "Springer New York LLC",
pages = "463--492",
booktitle = "Advances in Experimental Medicine and Biology",

}

TY - CHAP

T1 - Statistical approaches to candidate biomarker panel selection

AU - Spratt, Heidi

AU - Ju, Hyunsu

PY - 2016

Y1 - 2016

N2 - The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see Fig. 22.1, Chap. 19). Initially, data visualization (Sect. 22.1, below) is important to determine outliers and to get a feel for the nature of the data and whether there appear to be any differences among the groups being examined. From there, the data must be pre-processed (Sect. 22.2) so that outliers are handled, missing values are dealt with, and normality is assessed. Once the processed data has been cleaned and is ready for downstream analysis, hypothesis tests (Sect. 22.3) are performed, and proteins that are differentially expressed are identified. Since the number of differentially expressed proteins is usually larger than warrants further investigation (50+ proteins versus just a handful that will be considered for a biomarker panel), some sort of feature reduction (Sect. 22.4) should be performed to narrow the list of candidate biomarkers down to a more reasonable number. Once the list of proteins has been reduced to those that are likely most useful for downstream classification purposes, unsupervised or supervised learning is performed (Sects. 22.5 and 22.6, respectively).

AB - The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see Fig. 22.1, Chap. 19). Initially, data visualization (Sect. 22.1, below) is important to determine outliers and to get a feel for the nature of the data and whether there appear to be any differences among the groups being examined. From there, the data must be pre-processed (Sect. 22.2) so that outliers are handled, missing values are dealt with, and normality is assessed. Once the processed data has been cleaned and is ready for downstream analysis, hypothesis tests (Sect. 22.3) are performed, and proteins that are differentially expressed are identified. Since the number of differentially expressed proteins is usually larger than warrants further investigation (50+ proteins versus just a handful that will be considered for a biomarker panel), some sort of feature reduction (Sect. 22.4) should be performed to narrow the list of candidate biomarkers down to a more reasonable number. Once the list of proteins has been reduced to those that are likely most useful for downstream classification purposes, unsupervised or supervised learning is performed (Sects. 22.5 and 22.6, respectively).

KW - Candidate biomarker selection

KW - Data clustering

KW - Data consistency

KW - Data inspection

KW - Data normalization

KW - Data transformations

KW - Machine learning

KW - Outlier detection

UR - http://www.scopus.com/inward/record.url?scp=85006437593&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85006437593&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-41448-5_22

DO - 10.1007/978-3-319-41448-5_22

M3 - Chapter

C2 - 27975231

AN - SCOPUS:85006437593

VL - 919

T3 - Advances in Experimental Medicine and Biology

SP - 463

EP - 492

BT - Advances in Experimental Medicine and Biology

PB - Springer New York LLC

ER -