Explainable Artificial Intelligence (XAI) in the Era of Large Language Models: Applying an XAI Framework in Pediatric Ophthalmology Diagnosis using the Gemini Model.
Dipak P Upadhyaya, Katrina Prantzalos, Pedram Golnari, Aasef G Shaikh, Subhashini Sivagnanam, Amitava Majumdar, Fatema F Ghasia, Satya S Sahoo
{"title":"Explainable Artificial Intelligence (XAI) in the Era of Large Language Models: Applying an XAI Framework in Pediatric Ophthalmology Diagnosis using the Gemini Model.","authors":"Dipak P Upadhyaya, Katrina Prantzalos, Pedram Golnari, Aasef G Shaikh, Subhashini Sivagnanam, Amitava Majumdar, Fatema F Ghasia, Satya S Sahoo","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Amblyopia is a neurodevelopmental disorder affecting children's visual acuity, requiring early diagnosis for effective treatment. Traditional diagnostic methods rely on subjective evaluations of eye tracking recordings from high fidelity eye tracking instruments performed by specialized pediatric ophthalmologists, often unavailable in rural, low resource clinics. As such, there is an urgent need to develop a scalable, low cost, high accuracy approach to automatically analyze eye tracking recordings. Large Language Models (LLM) show promise in accurate detection of amblyopia; our prior work has shown that the Google Gemini model, guided by expert ophthalmologists, can detect control and amblyopic subjects from eye tracking recordings. However, there is a clear need to address the issues of transparency and trust in medical applications of LLMs. To bolster the reliability and interpretability of LLM analysis of eye tracking records, we developed a Feature Guided Interprative Prompting (FGIP) framework focused on critical clinical features. Using the Google Gemini model, we classify high-fidelity eye-tracking data to detect amblyopia in children and apply the Quantus framework to evaluate the classification results across key metrics (faithfulness, robustness, localization, and complexity). These metrics provide a quantitative basis for understanding the model's decision-making process. This work presents the first implementation of an Explainable Artificial Intelligence (XAI) framework to systematically characterize the results generated by the Gemini model using high-fidelity eye-tracking data to detect amblyopia in children. Results demonstrated that the model accurately classified control and amblyopic subjects, including those with nystagmus while maintaining transparency and clinical alignment. The results of this study support the development of a scalable and interpretable clinical decision support (CDS) tool using LLMs that has the potential to enhance the trustworthiness of AI applications.</p>","PeriodicalId":72181,"journal":{"name":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","volume":"2025 ","pages":"566-575"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12150742/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Amblyopia is a neurodevelopmental disorder affecting children's visual acuity, requiring early diagnosis for effective treatment. Traditional diagnostic methods rely on subjective evaluations of eye tracking recordings from high fidelity eye tracking instruments performed by specialized pediatric ophthalmologists, often unavailable in rural, low resource clinics. As such, there is an urgent need to develop a scalable, low cost, high accuracy approach to automatically analyze eye tracking recordings. Large Language Models (LLM) show promise in accurate detection of amblyopia; our prior work has shown that the Google Gemini model, guided by expert ophthalmologists, can detect control and amblyopic subjects from eye tracking recordings. However, there is a clear need to address the issues of transparency and trust in medical applications of LLMs. To bolster the reliability and interpretability of LLM analysis of eye tracking records, we developed a Feature Guided Interprative Prompting (FGIP) framework focused on critical clinical features. Using the Google Gemini model, we classify high-fidelity eye-tracking data to detect amblyopia in children and apply the Quantus framework to evaluate the classification results across key metrics (faithfulness, robustness, localization, and complexity). These metrics provide a quantitative basis for understanding the model's decision-making process. This work presents the first implementation of an Explainable Artificial Intelligence (XAI) framework to systematically characterize the results generated by the Gemini model using high-fidelity eye-tracking data to detect amblyopia in children. Results demonstrated that the model accurately classified control and amblyopic subjects, including those with nystagmus while maintaining transparency and clinical alignment. The results of this study support the development of a scalable and interpretable clinical decision support (CDS) tool using LLMs that has the potential to enhance the trustworthiness of AI applications.
弱视是一种影响儿童视力的神经发育障碍,需要早期诊断才能有效治疗。传统的诊断方法依赖于由专业儿科眼科医生使用的高保真眼动追踪仪器对眼动追踪记录的主观评估,这在农村、资源匮乏的诊所往往无法获得。因此,迫切需要开发一种可扩展、低成本、高精度的方法来自动分析眼动追踪记录。大语言模型(Large Language Models, LLM)在弱视的准确检测方面有前景;我们之前的工作表明,在眼科专家的指导下,谷歌双子座模型可以从眼动追踪记录中检测出控制性和弱视受试者。然而,显然需要解决法学硕士在医疗应用中的透明度和信任问题。为了提高眼动记录LLM分析的可靠性和可解释性,我们开发了一个特征引导解释提示(FGIP)框架,重点关注关键临床特征。使用谷歌Gemini模型,我们对高保真眼动追踪数据进行分类,以检测儿童弱视,并应用Quantus框架评估关键指标(忠实度、鲁棒性、本地化和复杂性)的分类结果。这些指标为理解模型的决策过程提供了定量基础。这项工作首次实现了可解释人工智能(XAI)框架,该框架使用高保真眼动追踪数据系统地表征Gemini模型产生的结果,以检测儿童弱视。结果表明,该模型在保持透明度和临床一致性的同时,准确地分类了对照组和弱视受试者,包括眼球震颤患者。这项研究的结果支持使用法学硕士开发可扩展和可解释的临床决策支持(CDS)工具,该工具有可能提高人工智能应用的可信度。