Skip to main navigation Skip to search Skip to main content

AllergenAI: a deep learning model predicting allergenicity based on protein sequence

  • Jiajia Liu
  • , Surendra S. Negi
  • , Chengyuan Yang
  • , Xiaobo Zhou
  • , Catherine H. Schein
  • , Werner Braun
  • , Pora Kim

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Innovations in protein engineering offer promising solutions for redesigning allergenic proteins to minimize adverse reactions in sensitive individuals. Earlier models for predicting allergenicity have relied on the knowledge of physicochemical properties and sequence homology to assess the potential risk. However, to better understand the allergenic proteins’ sequence features, we need a novel sequence-based deep learning model for predicting allergenicity. Results: We present a novel AI-based tool, AllergenAI, to quantify the allergenic potential of a protein’s sequence without using any other known features. Our study utilized allergenic protein sequence data archived in the three well-established databases, SDAP 2.0, COMPARE, and AlgPred 2, to train a convolutional neural network and assessed its prediction performance by cross-validation. We then used AllergenAI to find novel potential proteins of the cupin family in date palm, spinach, maize, and red clover plants with a high allergenicity score that might have an adverse allergenic effect on sensitive individuals. By analyzing the feature importance scores (FIS) of vicilins, we identified a proline-alanine-rich (P-A) motif in the top 50% of FIS regions that overlapped with known IgE epitope regions of vicilin allergens. We then used the approximately 1600 allergen structures in our SDAP database, in a pilot study to show the potential to incorporate 3D information in a CNN model. The prediction quality was slightly increased. Conclusion: Our allergenicity prediction study through the development of AllergenAI provides a foundation for identifying the critical features that distinguish allergenic proteins.

Original languageEnglish (US)
Article number279
JournalBMC bioinformatics
Volume26
Issue number1
DOIs
StatePublished - Dec 2025

Keywords

  • 3D structure
  • Allergenic proteins
  • CNN
  • Deep learning
  • Novel vicilin allergen analogs

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'AllergenAI: a deep learning model predicting allergenicity based on protein sequence'. Together they form a unique fingerprint.

Cite this