Claudio Xompero, Lorenzo Rossi, Francesca Amoroso, Antonio Bechara Ghobril, Diana Elena Ionita, Eric H. Souied, Carl-Joe Mehanna
{"title":"Pilot study of ASSORT (AI-based Symptom Stratification in Ophthalmology for Rapid Triage): a triage tool for ophthalmic emergencies","authors":"Claudio Xompero, Lorenzo Rossi, Francesca Amoroso, Antonio Bechara Ghobril, Diana Elena Ionita, Eric H. Souied, Carl-Joe Mehanna","doi":"10.1016/j.ajoint.2025.100159","DOIUrl":null,"url":null,"abstract":"<div><h3>Introduction</h3><div>ASSORT (AI-based Symptom Stratification in Ophthalmology for Rapid Triage) is a GPT-4-based triage tool designed to assess ophthalmic emergencies using a three-tier color-coded system. This study compares ASSORT to the Rescue triage method, using the ophthalmologist’s final assessment as the reference standard.</div></div><div><h3>Materials and methods</h3><div>A prospective study was conducted at the Créteil University Hospital from April to June 2024. Each patient underwent triage using ASSORT, followed by the Rescue triage method. Both tools used the same color-coding system to stratify severity: yellow for emergency cases, green for urgent cases, and white for non-urgent cases. An examining ophthalmologist in their final year of residency performed the final assessment. Concordance between the ophthalmologist and each of the tools was analyzed using Cohen’s kappa coefficient, alongside precision and recall metrics.</div></div><div><h3>Results</h3><div>Fifty-one patients were included. Case severities were distributed as follows: 22/51 white, 27/51 green, and 2/51 yellow, with conjunctivitis (17.5 %) and corneal abrasions (12.5 %) being the two most common presentations. ASSORT demonstrated moderate agreement with the ophthalmologist (κ = 0.54), whereas Rescue showed stronger concordance (κ = 0.85). ASSORT tended to overestimate urgency, assigning more yellow codes than the ophthalmologist. McNemar’s test confirmed significant misclassification by ASSORT (<em>p</em> = 0.0156), while Rescue showed no significant deviation (<em>p</em> = 0.5).</div></div><div><h3>Conclusion</h3><div>While the small sample size limits generalizability, ASSORT shows potential for AI-driven ophthalmic triage but currently overestimates severity compared to the ophthalmologist. Further refinements such as reinforcement learning and multimodal input, as well as large-scale validation are needed to improve accuracy and reduce unnecessary emergency classifications before clinical implementation.</div></div>","PeriodicalId":100071,"journal":{"name":"AJO International","volume":"2 3","pages":"Article 100159"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AJO International","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2950253525000620","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
ASSORT (AI-based Symptom Stratification in Ophthalmology for Rapid Triage) is a GPT-4-based triage tool designed to assess ophthalmic emergencies using a three-tier color-coded system. This study compares ASSORT to the Rescue triage method, using the ophthalmologist’s final assessment as the reference standard.
Materials and methods
A prospective study was conducted at the Créteil University Hospital from April to June 2024. Each patient underwent triage using ASSORT, followed by the Rescue triage method. Both tools used the same color-coding system to stratify severity: yellow for emergency cases, green for urgent cases, and white for non-urgent cases. An examining ophthalmologist in their final year of residency performed the final assessment. Concordance between the ophthalmologist and each of the tools was analyzed using Cohen’s kappa coefficient, alongside precision and recall metrics.
Results
Fifty-one patients were included. Case severities were distributed as follows: 22/51 white, 27/51 green, and 2/51 yellow, with conjunctivitis (17.5 %) and corneal abrasions (12.5 %) being the two most common presentations. ASSORT demonstrated moderate agreement with the ophthalmologist (κ = 0.54), whereas Rescue showed stronger concordance (κ = 0.85). ASSORT tended to overestimate urgency, assigning more yellow codes than the ophthalmologist. McNemar’s test confirmed significant misclassification by ASSORT (p = 0.0156), while Rescue showed no significant deviation (p = 0.5).
Conclusion
While the small sample size limits generalizability, ASSORT shows potential for AI-driven ophthalmic triage but currently overestimates severity compared to the ophthalmologist. Further refinements such as reinforcement learning and multimodal input, as well as large-scale validation are needed to improve accuracy and reduce unnecessary emergency classifications before clinical implementation.