Diagnostics for statistical variable selection methods for prediction of peptic ulcer disease in Helicobacter pylori infection

Hyunsu Ju, Allan R. Brasier, Alexander Kurosky, Bo Xu, Victor Reyes, David Y. Graham

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Background: The development of accurate classification models depends upon the methods used to identify the most relevant variables. The aim of this article is to evaluate variable selection methods to identify important variables in predicting a binary response using nonlinear statistical models. Our goals in model selection include producing non-overfitting stable models that are interpretable, that generate accurate predictions and have minimum bias. This work was motivated by data on clinical and laboratory features of Helicobacter pylori infections obtained from 60 individuals enrolled in a prospective observational study. Results: We carried out a comprehensive performance comparison of several nonlinear classification models over the H. pylori data set. We compared variable selection results by Multivariate Adaptive Regression Splines (MARS), Logistic Regression with regularization, Generalized Additive Models (GAMs) and Bayesian Variable Selection in GAMs. We found that the MARS model approach has the highest predictive power because the nonlinearity assumptions of candidate predictors are strongly satisfied, a finding demonstrated via deviance chisquare testing procedures in GAMs. Conclusions: Our results suggest that the physiological free amino acids citrulline, histidine, lysine and arginine are the major features for predicting H. pylori peptic ulcer disease on the basis of amino acid profiling.

Original languageEnglish (US)
Pages (from-to)95-101
Number of pages7
JournalJournal of Proteomics and Bioinformatics
Volume7
Issue number4
DOIs
StatePublished - 2014

Fingerprint

Helicobacter Infections
Peptic Ulcer
Helicobacter pylori
Nonlinear Dynamics
Amino Acids
Citrulline
Statistical Models
Histidine
Lysine
Observational Studies
Arginine
Logistic Models
Prospective Studies
Splines
Amino acids
Logistics
Testing

Keywords

  • Amino acid analysis
  • Classification
  • Helicobacter pylori
  • Peptic ulcer disease
  • Variable selection

ASJC Scopus subject areas

  • Biochemistry
  • Cell Biology
  • Molecular Biology
  • Computer Science Applications

Cite this

Diagnostics for statistical variable selection methods for prediction of peptic ulcer disease in Helicobacter pylori infection. / Ju, Hyunsu; Brasier, Allan R.; Kurosky, Alexander; Xu, Bo; Reyes, Victor; Graham, David Y.

In: Journal of Proteomics and Bioinformatics, Vol. 7, No. 4, 2014, p. 95-101.

Research output: Contribution to journalArticle

Ju, Hyunsu ; Brasier, Allan R. ; Kurosky, Alexander ; Xu, Bo ; Reyes, Victor ; Graham, David Y. / Diagnostics for statistical variable selection methods for prediction of peptic ulcer disease in Helicobacter pylori infection. In: Journal of Proteomics and Bioinformatics. 2014 ; Vol. 7, No. 4. pp. 95-101.
@article{1afb00fdf32644d696220929e6dd4448,
title = "Diagnostics for statistical variable selection methods for prediction of peptic ulcer disease in Helicobacter pylori infection",
abstract = "Background: The development of accurate classification models depends upon the methods used to identify the most relevant variables. The aim of this article is to evaluate variable selection methods to identify important variables in predicting a binary response using nonlinear statistical models. Our goals in model selection include producing non-overfitting stable models that are interpretable, that generate accurate predictions and have minimum bias. This work was motivated by data on clinical and laboratory features of Helicobacter pylori infections obtained from 60 individuals enrolled in a prospective observational study. Results: We carried out a comprehensive performance comparison of several nonlinear classification models over the H. pylori data set. We compared variable selection results by Multivariate Adaptive Regression Splines (MARS), Logistic Regression with regularization, Generalized Additive Models (GAMs) and Bayesian Variable Selection in GAMs. We found that the MARS model approach has the highest predictive power because the nonlinearity assumptions of candidate predictors are strongly satisfied, a finding demonstrated via deviance chisquare testing procedures in GAMs. Conclusions: Our results suggest that the physiological free amino acids citrulline, histidine, lysine and arginine are the major features for predicting H. pylori peptic ulcer disease on the basis of amino acid profiling.",
keywords = "Amino acid analysis, Classification, Helicobacter pylori, Peptic ulcer disease, Variable selection",
author = "Hyunsu Ju and Brasier, {Allan R.} and Alexander Kurosky and Bo Xu and Victor Reyes and Graham, {David Y.}",
year = "2014",
doi = "10.4172/jpb.1000308",
language = "English (US)",
volume = "7",
pages = "95--101",
journal = "Journal of Proteomics and Bioinformatics",
issn = "0974-276X",
publisher = "Omics Publishing Group",
number = "4",

}

TY - JOUR

T1 - Diagnostics for statistical variable selection methods for prediction of peptic ulcer disease in Helicobacter pylori infection

AU - Ju, Hyunsu

AU - Brasier, Allan R.

AU - Kurosky, Alexander

AU - Xu, Bo

AU - Reyes, Victor

AU - Graham, David Y.

PY - 2014

Y1 - 2014

N2 - Background: The development of accurate classification models depends upon the methods used to identify the most relevant variables. The aim of this article is to evaluate variable selection methods to identify important variables in predicting a binary response using nonlinear statistical models. Our goals in model selection include producing non-overfitting stable models that are interpretable, that generate accurate predictions and have minimum bias. This work was motivated by data on clinical and laboratory features of Helicobacter pylori infections obtained from 60 individuals enrolled in a prospective observational study. Results: We carried out a comprehensive performance comparison of several nonlinear classification models over the H. pylori data set. We compared variable selection results by Multivariate Adaptive Regression Splines (MARS), Logistic Regression with regularization, Generalized Additive Models (GAMs) and Bayesian Variable Selection in GAMs. We found that the MARS model approach has the highest predictive power because the nonlinearity assumptions of candidate predictors are strongly satisfied, a finding demonstrated via deviance chisquare testing procedures in GAMs. Conclusions: Our results suggest that the physiological free amino acids citrulline, histidine, lysine and arginine are the major features for predicting H. pylori peptic ulcer disease on the basis of amino acid profiling.

AB - Background: The development of accurate classification models depends upon the methods used to identify the most relevant variables. The aim of this article is to evaluate variable selection methods to identify important variables in predicting a binary response using nonlinear statistical models. Our goals in model selection include producing non-overfitting stable models that are interpretable, that generate accurate predictions and have minimum bias. This work was motivated by data on clinical and laboratory features of Helicobacter pylori infections obtained from 60 individuals enrolled in a prospective observational study. Results: We carried out a comprehensive performance comparison of several nonlinear classification models over the H. pylori data set. We compared variable selection results by Multivariate Adaptive Regression Splines (MARS), Logistic Regression with regularization, Generalized Additive Models (GAMs) and Bayesian Variable Selection in GAMs. We found that the MARS model approach has the highest predictive power because the nonlinearity assumptions of candidate predictors are strongly satisfied, a finding demonstrated via deviance chisquare testing procedures in GAMs. Conclusions: Our results suggest that the physiological free amino acids citrulline, histidine, lysine and arginine are the major features for predicting H. pylori peptic ulcer disease on the basis of amino acid profiling.

KW - Amino acid analysis

KW - Classification

KW - Helicobacter pylori

KW - Peptic ulcer disease

KW - Variable selection

UR - http://www.scopus.com/inward/record.url?scp=84899682241&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84899682241&partnerID=8YFLogxK

U2 - 10.4172/jpb.1000308

DO - 10.4172/jpb.1000308

M3 - Article

VL - 7

SP - 95

EP - 101

JO - Journal of Proteomics and Bioinformatics

JF - Journal of Proteomics and Bioinformatics

SN - 0974-276X

IS - 4

ER -