Diagnostics for statistical variable selection methods for prediction of peptic ulcer disease in Helicobacter pylori infection

Hyunsu Ju, Allan R. Brasier, Alexander Kurosky, Bo Xu, Victor E. Reyes, David Y. Graham

Research output: Contribution to journalArticlepeer-review

5 Scopus citations


Background: The development of accurate classification models depends upon the methods used to identify the most relevant variables. The aim of this article is to evaluate variable selection methods to identify important variables in predicting a binary response using nonlinear statistical models. Our goals in model selection include producing non-overfitting stable models that are interpretable, that generate accurate predictions and have minimum bias. This work was motivated by data on clinical and laboratory features of Helicobacter pylori infections obtained from 60 individuals enrolled in a prospective observational study. Results: We carried out a comprehensive performance comparison of several nonlinear classification models over the H. pylori data set. We compared variable selection results by Multivariate Adaptive Regression Splines (MARS), Logistic Regression with regularization, Generalized Additive Models (GAMs) and Bayesian Variable Selection in GAMs. We found that the MARS model approach has the highest predictive power because the nonlinearity assumptions of candidate predictors are strongly satisfied, a finding demonstrated via deviance chisquare testing procedures in GAMs. Conclusions: Our results suggest that the physiological free amino acids citrulline, histidine, lysine and arginine are the major features for predicting H. pylori peptic ulcer disease on the basis of amino acid profiling.

Original languageEnglish (US)
Pages (from-to)95-101
Number of pages7
JournalJournal of Proteomics and Bioinformatics
Issue number4
StatePublished - 2014


  • Amino acid analysis
  • Classification
  • Helicobacter pylori
  • Peptic ulcer disease
  • Variable selection

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Cell Biology


Dive into the research topics of 'Diagnostics for statistical variable selection methods for prediction of peptic ulcer disease in Helicobacter pylori infection'. Together they form a unique fingerprint.

Cite this