TY - CHAP
T1 - Statistical approaches to candidate biomarker panel selection
AU - Spratt, Heidi
AU - Ju, Hyunsu
PY - 2016
Y1 - 2016
N2 - The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see Fig. 22.1, Chap. 19). Initially, data visualization (Sect. 22.1, below) is important to determine outliers and to get a feel for the nature of the data and whether there appear to be any differences among the groups being examined. From there, the data must be pre-processed (Sect. 22.2) so that outliers are handled, missing values are dealt with, and normality is assessed. Once the processed data has been cleaned and is ready for downstream analysis, hypothesis tests (Sect. 22.3) are performed, and proteins that are differentially expressed are identified. Since the number of differentially expressed proteins is usually larger than warrants further investigation (50+ proteins versus just a handful that will be considered for a biomarker panel), some sort of feature reduction (Sect. 22.4) should be performed to narrow the list of candidate biomarkers down to a more reasonable number. Once the list of proteins has been reduced to those that are likely most useful for downstream classification purposes, unsupervised or supervised learning is performed (Sects. 22.5 and 22.6, respectively).
AB - The statistical analysis of robust biomarker candidates is a complex process, and is involved in several key steps in the overall biomarker development pipeline (see Fig. 22.1, Chap. 19). Initially, data visualization (Sect. 22.1, below) is important to determine outliers and to get a feel for the nature of the data and whether there appear to be any differences among the groups being examined. From there, the data must be pre-processed (Sect. 22.2) so that outliers are handled, missing values are dealt with, and normality is assessed. Once the processed data has been cleaned and is ready for downstream analysis, hypothesis tests (Sect. 22.3) are performed, and proteins that are differentially expressed are identified. Since the number of differentially expressed proteins is usually larger than warrants further investigation (50+ proteins versus just a handful that will be considered for a biomarker panel), some sort of feature reduction (Sect. 22.4) should be performed to narrow the list of candidate biomarkers down to a more reasonable number. Once the list of proteins has been reduced to those that are likely most useful for downstream classification purposes, unsupervised or supervised learning is performed (Sects. 22.5 and 22.6, respectively).
KW - Candidate biomarker selection
KW - Data clustering
KW - Data consistency
KW - Data inspection
KW - Data normalization
KW - Data transformations
KW - Machine learning
KW - Outlier detection
UR - http://www.scopus.com/inward/record.url?scp=85006437593&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85006437593&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-41448-5_22
DO - 10.1007/978-3-319-41448-5_22
M3 - Chapter
C2 - 27975231
AN - SCOPUS:85006437593
VL - 919
T3 - Advances in Experimental Medicine and Biology
SP - 463
EP - 492
BT - Advances in Experimental Medicine and Biology
PB - Springer New York LLC
ER -