A new deep learning approach to probe the molecular basis of inhalation allergens

Project: Research project

Project Details

Description

Inhalation allergy is a major health problem worldwide and is most caused by allergies to grasses, dust mites, and ragweed. About 45% of the US population is sensitized to one or more allergens from these sources, with 30% of the US population sensitized to ragweed pollen. More than 90% of reactive serum IgE in ragweed- sensitized patients is against the group 1 allergen, Amb a 1, a non-glycosylated 38-kDa protein that belongs to the pectate lyase (PL) family. Patients sensitized to ragweed pollen also cross-react to other PL allergens, such as the mugwort and sunflower allergens, Art v 6 and Hel a 6, respectively. However, it is not known if there are other PL proteins that could be cross-reacting with these allergens. We hypothesize that artificial intelligence (AI) technologies that made dramatic improvements in recent years, can clarify this problem. We will use these innovative technologies to find characteristic sequence and structural features of allergenic PL proteins and identify new potential PL allergens. As preliminary data, we developed a new convolutional neural network (CNN) approach, SDAP_AI, which is trained on allergen sequences from our updated Structural Database of Allergenic Proteins, SDAP 2.0. Our preliminary model achieved a 93.4% accuracy in the test dataset of ten-fold cross-validation with an 80%/20% partition of training and test data. We also showed a favorable performance of our CNN model compared to other ML algorithms, such as AllergenFP and Algpred 2. In addition, our SDAP_AI approach has the potential to clarify what makes a protein allergenic. In the proposed project, we will further optimize SDAP_AI, assess its robustness by testing the predictions in other independent data sets, and apply our CNN model to characterize protein allergenicity. We will pursue two aims: 1) optimize a deep learning model, SDAP_AI, for allergenic proteins using sequence information of allergens in SDAP 2.0 and assess its prediction quality; and 2) apply SDAP_AI to identify and experimentally validate new potential allergens and IgE epitope peptides of PL allergens. Potential allergens will be experimentally validated by a peptide microarray assay with sera from patients sensitized to ragweed pollen. In addition, we also will apply SDAP_AI to reevaluate previously identified IgE epitopes on PL allergens (e.g., Amb a 1, Jun a 1, Hel a 6) to determine the accuracy of those epitopes and again validate results by microarray analysis with human sera. We will map linear IgE epitopes on the surface of PL allergens to define conformational epitopes. This information will help identify common features that make a protein an allergen and can aid future efforts to predict protein allergenicity. The application of AI technology to allergen research is novel and our combined experimental and computational approach will yield a powerful new computational paradigm to identify potential allergenicity of new proteins and help design new immunotherapies to reduce the burden of inhalation allergies. Source code, documentation for use and example input files will be available from our SDAP 2.0 website.
StatusActive
Effective start/end date7/1/256/30/27

Funding

  • National Institute of Allergy and Infectious Diseases ( Award #1R21AI19319501): $440,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.