Mohammad Aminan, S Solomon Darnell, Mohammad Delsoz, Amin Nabavi, Claire Wright, Brian Jerkins, Siamak Yousefi
{"title":"GlaucoRAG: A Retrieval-Augmented Large Language Model for Expert-Level Glaucoma Assessment.","authors":"Mohammad Aminan, S Solomon Darnell, Mohammad Delsoz, Amin Nabavi, Claire Wright, Brian Jerkins, Siamak Yousefi","doi":"10.1101/2025.07.03.25330805","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Purpose: Accurate glaucoma assessment is challenging because of the complexity and chronic nature of the disease; therefore, there is a critical need for models that provide evidence-based, accurate assessment. The purpose of this study was to evaluate the capabilities of a glaucoma specialized Retrieval-Augmented Generation (RAG) framework (GlaucoRAG) that leverages a large language model (LLM) for diagnosing glaucoma and answering to glaucoma specific questions.</p><p><strong>Design: </strong>Evaluation of diagnostic capabilities and knowledge of emerging technologies in glaucoma assessment.</p><p><strong>Participants: </strong>Detailed case reports from 11 patients and 250 multiple choice questions from the Basic and Clinical Science Course (BCSC) Self-Assessment were used to test the LLM based GlaucoRAG. No human participants were involved.</p><p><strong>Methods: </strong>We developed GlaucoRAG, a RAG framework leveraging GPT-4.5-PREVIEW integrated with the R2R platform for automated question answering in glaucoma. We created a glaucoma knowledge base comprising more than 1,800 peer-reviewed glaucoma articles, 15 guidelines and three glaucoma textbooks. The diagnostic performance was tested on case reports and multiple-choice questions. Model outputs were compared with the independent answers of three glaucoma specialists, DeepSeek-R1, and GPT-4.5-PREVIEW (without RAG). Quantitative performance was further assessed with the RAG Assessment (RAGAS) framework, reporting faithfulness, context precision, context recall, and answer relevancy.</p><p><strong>Main outcome measures: </strong>The primary outcome measure was GlaucoRAG's diagnostic accuracy on patient case reports and percentage of correct responses to the BCSC Self-Assessment glaucoma items, compared with the performance of glaucoma specialists and two benchmark LLMs. Secondary outcomes included RAGAS sub scores.</p><p><strong>Results: </strong>GlaucoRAG achieved an accuracy of 81.8% on glaucoma case reports, compared with 72.7% for GPT-4.5-PREVIEW and 63.7% for DeepSeek-R1. On glaucoma BCSC Self-Assessment questions, GlaucoRAG achieved 91.2% accuracy (228 / 250), whereas GPT-4.5-PREVIEW and DeepSeek-R1 attained 84.4% (211 / 250) and 76.0% (190 / 250), respectively. The RAGAS evaluation returned an answer relevancy of 91%, with 80% context recall, 70% faithfulness, and 59% context precision.</p><p><strong>Conclusions: </strong>The glaucoma-specialized LLM, GlaucoRAG, showed encouraging performance in glaucoma assessment and may complement glaucoma research and clinical practice as well as question answering with glaucoma patients.</p>","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12265780/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv : the preprint server for health sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2025.07.03.25330805","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Purpose: Accurate glaucoma assessment is challenging because of the complexity and chronic nature of the disease; therefore, there is a critical need for models that provide evidence-based, accurate assessment. The purpose of this study was to evaluate the capabilities of a glaucoma specialized Retrieval-Augmented Generation (RAG) framework (GlaucoRAG) that leverages a large language model (LLM) for diagnosing glaucoma and answering to glaucoma specific questions.
Design: Evaluation of diagnostic capabilities and knowledge of emerging technologies in glaucoma assessment.
Participants: Detailed case reports from 11 patients and 250 multiple choice questions from the Basic and Clinical Science Course (BCSC) Self-Assessment were used to test the LLM based GlaucoRAG. No human participants were involved.
Methods: We developed GlaucoRAG, a RAG framework leveraging GPT-4.5-PREVIEW integrated with the R2R platform for automated question answering in glaucoma. We created a glaucoma knowledge base comprising more than 1,800 peer-reviewed glaucoma articles, 15 guidelines and three glaucoma textbooks. The diagnostic performance was tested on case reports and multiple-choice questions. Model outputs were compared with the independent answers of three glaucoma specialists, DeepSeek-R1, and GPT-4.5-PREVIEW (without RAG). Quantitative performance was further assessed with the RAG Assessment (RAGAS) framework, reporting faithfulness, context precision, context recall, and answer relevancy.
Main outcome measures: The primary outcome measure was GlaucoRAG's diagnostic accuracy on patient case reports and percentage of correct responses to the BCSC Self-Assessment glaucoma items, compared with the performance of glaucoma specialists and two benchmark LLMs. Secondary outcomes included RAGAS sub scores.
Results: GlaucoRAG achieved an accuracy of 81.8% on glaucoma case reports, compared with 72.7% for GPT-4.5-PREVIEW and 63.7% for DeepSeek-R1. On glaucoma BCSC Self-Assessment questions, GlaucoRAG achieved 91.2% accuracy (228 / 250), whereas GPT-4.5-PREVIEW and DeepSeek-R1 attained 84.4% (211 / 250) and 76.0% (190 / 250), respectively. The RAGAS evaluation returned an answer relevancy of 91%, with 80% context recall, 70% faithfulness, and 59% context precision.
Conclusions: The glaucoma-specialized LLM, GlaucoRAG, showed encouraging performance in glaucoma assessment and may complement glaucoma research and clinical practice as well as question answering with glaucoma patients.