Comparative analysis of GPT-4 and Google Gemini's consistency with pediatric otolaryngology guidelines

Nicholas A. Rossi, Kassandra K. Corona, Yuki Yoshiyasu, Yusif Hajiyev, Charles A. Hughes, Harold S. Pine

Research output: Contribution to journalArticlepeer-review

Abstract

Objective: To evaluate the accuracy and completeness of large language models (LLMs) in interpreting pediatric otolaryngology guidelines. Materials and methods: GPT-4 and Google Gemini were assessed on their responses to queries based on key action statements from three American Academy of Otolaryngology – Head and Neck Surgery Foundation (AAO-HNSF) clinical practice guidelines. Two independent reviewers evaluated responses using Likert scales for accuracy (1–5) and completeness (1–3). Inter-rater reliability was assessed with weighted Cohen's kappa. Statistical comparisons between models were performed using the Wilcoxon Signed-Rank Test. Results: Both models achieved high scores (GPT-4: accuracy 4.74, completeness 2.94; Google Gemini: accuracy 4.82, completeness 2.98). No significant difference was found in accuracy (p = 0.134), while completeness showed concordance (p = 0.34). AI responses often emphasized the importance of individualization and consulting healthcare professionals. Conclusion: GPT-4 and Google Gemini demonstrated potential as assistive tools in pediatric otolaryngology. However, limitations exist, including pre-trained datasets and subjective evaluation methods. Continuous learning and model refinement are crucial for reliable clinical integration. AI should complement, not replace, human expertise. This study contributes to the exploration of LLMs in pediatric otolaryngology.

Original languageEnglish (US)
Article number112336
JournalInternational Journal of Pediatric Otorhinolaryngology
Volume193
DOIs
StatePublished - Jun 2025

Keywords

  • Artificial intelligence
  • Child
  • Clinical decision-making
  • Natural language processing
  • Otitis media
  • Polysomnography
  • Tonsillectomy

ASJC Scopus subject areas

  • Pediatrics, Perinatology, and Child Health
  • Otorhinolaryngology

Fingerprint

Dive into the research topics of 'Comparative analysis of GPT-4 and Google Gemini's consistency with pediatric otolaryngology guidelines'. Together they form a unique fingerprint.

Cite this