Variable selection methods for developing a biomarker panel for prediction of dengue hemorrhagic fever

Hyunsu Ju, Allan R. Brasier

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Background: The choice of selection methods to identify important variables for binary classification modeling is critical to produce stable models that are interpretable, that generate accurate predictions and have minimum bias. This work is motivated by data on clinical and laboratory features of severe dengue infections (dengue hemorrhagic fever, DHF) obtained from 51 individuals enrolled in a prospective observational study of acute human dengue infections. Results: We carry out a comprehensive performance comparison using several classification models for DHF over the dengue data set. We compared variable selection results by Multivariate Adaptive Regression Splines, Learning Ensemble, Random Forest, Bayesian Moving Averaging, Stochastic Search Variable Selection, and Generalized Regularized Logistics Regression. Model averaging methods (bagging, boosting and ensemble learners) have higher accuracy, but the generalized regularized regression model has the highest predictive power because the linearity assumptions of candidate predictors are strongly satisfied via deviance chi-square testing procedures. Bootstrapping applications for evaluating predictive regression coefficients in regularized regression model are performed. Conclusions: Feature reduction methods introduce inherent biases and therefore are data-type dependent. We propose that these limitations can be overcome using an exhaustive approach for searching feature space. Using this approach, our results suggest that IL-10, platelet and lymphocyte counts are the major features for predicting dengue DHF on the basis of blood chemistries and cytokine measurements.

Original languageEnglish (US)
Article number365
JournalBMC Research Notes
Volume6
Issue number1
DOIs
StatePublished - 2013

Fingerprint

Severe Dengue
Biomarkers
Dengue
Lymphocyte Count
Infection
Platelet Count
Interleukin-10
Lymphocytes
Observational Studies
Platelets
Logistic Models
Splines
Learning
Prospective Studies
Logistics
Cytokines
Blood
Testing

Keywords

  • Bootstrap sampling
  • Classification
  • Data mining
  • Variable selection

ASJC Scopus subject areas

  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Variable selection methods for developing a biomarker panel for prediction of dengue hemorrhagic fever. / Ju, Hyunsu; Brasier, Allan R.

In: BMC Research Notes, Vol. 6, No. 1, 365, 2013.

Research output: Contribution to journalArticle

@article{fdc3a53e39a14b00bedf38382594e35a,
title = "Variable selection methods for developing a biomarker panel for prediction of dengue hemorrhagic fever",
abstract = "Background: The choice of selection methods to identify important variables for binary classification modeling is critical to produce stable models that are interpretable, that generate accurate predictions and have minimum bias. This work is motivated by data on clinical and laboratory features of severe dengue infections (dengue hemorrhagic fever, DHF) obtained from 51 individuals enrolled in a prospective observational study of acute human dengue infections. Results: We carry out a comprehensive performance comparison using several classification models for DHF over the dengue data set. We compared variable selection results by Multivariate Adaptive Regression Splines, Learning Ensemble, Random Forest, Bayesian Moving Averaging, Stochastic Search Variable Selection, and Generalized Regularized Logistics Regression. Model averaging methods (bagging, boosting and ensemble learners) have higher accuracy, but the generalized regularized regression model has the highest predictive power because the linearity assumptions of candidate predictors are strongly satisfied via deviance chi-square testing procedures. Bootstrapping applications for evaluating predictive regression coefficients in regularized regression model are performed. Conclusions: Feature reduction methods introduce inherent biases and therefore are data-type dependent. We propose that these limitations can be overcome using an exhaustive approach for searching feature space. Using this approach, our results suggest that IL-10, platelet and lymphocyte counts are the major features for predicting dengue DHF on the basis of blood chemistries and cytokine measurements.",
keywords = "Bootstrap sampling, Classification, Data mining, Variable selection",
author = "Hyunsu Ju and Brasier, {Allan R.}",
year = "2013",
doi = "10.1186/1756-0500-6-365",
language = "English (US)",
volume = "6",
journal = "BMC Research Notes",
issn = "1756-0500",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Variable selection methods for developing a biomarker panel for prediction of dengue hemorrhagic fever

AU - Ju, Hyunsu

AU - Brasier, Allan R.

PY - 2013

Y1 - 2013

N2 - Background: The choice of selection methods to identify important variables for binary classification modeling is critical to produce stable models that are interpretable, that generate accurate predictions and have minimum bias. This work is motivated by data on clinical and laboratory features of severe dengue infections (dengue hemorrhagic fever, DHF) obtained from 51 individuals enrolled in a prospective observational study of acute human dengue infections. Results: We carry out a comprehensive performance comparison using several classification models for DHF over the dengue data set. We compared variable selection results by Multivariate Adaptive Regression Splines, Learning Ensemble, Random Forest, Bayesian Moving Averaging, Stochastic Search Variable Selection, and Generalized Regularized Logistics Regression. Model averaging methods (bagging, boosting and ensemble learners) have higher accuracy, but the generalized regularized regression model has the highest predictive power because the linearity assumptions of candidate predictors are strongly satisfied via deviance chi-square testing procedures. Bootstrapping applications for evaluating predictive regression coefficients in regularized regression model are performed. Conclusions: Feature reduction methods introduce inherent biases and therefore are data-type dependent. We propose that these limitations can be overcome using an exhaustive approach for searching feature space. Using this approach, our results suggest that IL-10, platelet and lymphocyte counts are the major features for predicting dengue DHF on the basis of blood chemistries and cytokine measurements.

AB - Background: The choice of selection methods to identify important variables for binary classification modeling is critical to produce stable models that are interpretable, that generate accurate predictions and have minimum bias. This work is motivated by data on clinical and laboratory features of severe dengue infections (dengue hemorrhagic fever, DHF) obtained from 51 individuals enrolled in a prospective observational study of acute human dengue infections. Results: We carry out a comprehensive performance comparison using several classification models for DHF over the dengue data set. We compared variable selection results by Multivariate Adaptive Regression Splines, Learning Ensemble, Random Forest, Bayesian Moving Averaging, Stochastic Search Variable Selection, and Generalized Regularized Logistics Regression. Model averaging methods (bagging, boosting and ensemble learners) have higher accuracy, but the generalized regularized regression model has the highest predictive power because the linearity assumptions of candidate predictors are strongly satisfied via deviance chi-square testing procedures. Bootstrapping applications for evaluating predictive regression coefficients in regularized regression model are performed. Conclusions: Feature reduction methods introduce inherent biases and therefore are data-type dependent. We propose that these limitations can be overcome using an exhaustive approach for searching feature space. Using this approach, our results suggest that IL-10, platelet and lymphocyte counts are the major features for predicting dengue DHF on the basis of blood chemistries and cytokine measurements.

KW - Bootstrap sampling

KW - Classification

KW - Data mining

KW - Variable selection

UR - http://www.scopus.com/inward/record.url?scp=84883651230&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883651230&partnerID=8YFLogxK

U2 - 10.1186/1756-0500-6-365

DO - 10.1186/1756-0500-6-365

M3 - Article

C2 - 24025735

AN - SCOPUS:84883651230

VL - 6

JO - BMC Research Notes

JF - BMC Research Notes

SN - 1756-0500

IS - 1

M1 - 365

ER -