干眼症的诊断准确性:对临床和人工智能局限性的见解:干眼症诊断准确性的局限性。

IF 3.7 3区 医学 Q1 OPHTHALMOLOGY
Germán Mejía-Salgado, William Rojas-Carabali, Carlos Cifuentes-González, María Andrea Bernal-Valencia, Paola Saboya-Galindo, Jaime Soto-Ariño, Valentina Dumar-Kerguelen, Guillermo Marroquín-Gómez, Martha Lucía Moreno-Pardo, Juliana Tirado-Ángel, Anat Galor, Alejandra de-la-Torre
{"title":"干眼症的诊断准确性:对临床和人工智能局限性的见解:干眼症诊断准确性的局限性。","authors":"Germán Mejía-Salgado, William Rojas-Carabali, Carlos Cifuentes-González, María Andrea Bernal-Valencia, Paola Saboya-Galindo, Jaime Soto-Ariño, Valentina Dumar-Kerguelen, Guillermo Marroquín-Gómez, Martha Lucía Moreno-Pardo, Juliana Tirado-Ángel, Anat Galor, Alejandra de-la-Torre","doi":"10.1016/j.clae.2025.102509","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To evaluate the agreement and performance of four large language models (LLMs)-ChatGPT-3.5, ChatGPT-4.0, Leny-ai, and MediSearch-in diagnosing and classifying Dry Eye Disease (DED), compared to clinician judgment and Dry Eye Workshop-II (DEWS-II) criteria.</p><p><strong>Methods: </strong>A standardized prompt incorporating retrospective clinical and symptomatic data from patients with suspected DED referred to a dry eye clinic was developed. LLMs were evaluated for diagnosis (DED vs. no DED) and classification (aqueous-deficient, evaporative, mixed-component). Agreement was assessed using Cohen's-kappa (Cκ) and Fleiss'-kappa (Fκ). Balanced accuracy, sensitivity, specificity, and F1 score were calculated.</p><p><strong>Results: </strong>Among 338 patients (78.6 % female, mean age 53.2 years), clinicians diagnosed DED in 300, and DEWS-II criteria identified 234. LLMs showed high agreement with clinicians for DED diagnosis (93 %-99 %, Cκ: 0.81-0.86). Subtype agreement was lower (aqueous-deficient: 0 %-18 %, evaporative: 4 %-80 %, mixed-component: 22 %-92 %; Fκ: -0.20 to -0.10). Diagnostic balanced accuracy was 48 %-56 %, with high sensitivity (93 %-99 %) but low specificity (0 %-16 %). Subtype balanced accuracy and F1 score ranged from 33 %-81 % 0 %-71 %, respectively. Compared to DEWS-II, agreement for DED diagnosis remained high (96 %-99 %) but with weaker Cκ (0.52-0.58). Subtype agreement was again low (aqueous-deficient: 0 %-20 %, evaporative: 9 %-68 %, mixed-component: 16 %-75 %; Fκ: -0.09 to -0.02). Diagnostic balanced accuracy was 49 %-56 %, sensitivity 97 %-99 %, and specificity 5 %-16 %. Subtype balanced accuracy ranged from 43 % to 56 %, F1 score 0-68.</p><p><strong>Conclusion: </strong>LLMs showed strong agreement and high sensitivity for DED diagnosis but limited specificity and poor subtype classification, mirroring clinical challenges and highlighting risks of overdiagnosis.</p>","PeriodicalId":49087,"journal":{"name":"Contact Lens & Anterior Eye","volume":" ","pages":"102509"},"PeriodicalIF":3.7000,"publicationDate":"2025-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diagnostic accuracy in dry eye: Insights into clinical and artificial intelligence limitations: Limitations of diagnostic accuracy in dry eye.\",\"authors\":\"Germán Mejía-Salgado, William Rojas-Carabali, Carlos Cifuentes-González, María Andrea Bernal-Valencia, Paola Saboya-Galindo, Jaime Soto-Ariño, Valentina Dumar-Kerguelen, Guillermo Marroquín-Gómez, Martha Lucía Moreno-Pardo, Juliana Tirado-Ángel, Anat Galor, Alejandra de-la-Torre\",\"doi\":\"10.1016/j.clae.2025.102509\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>To evaluate the agreement and performance of four large language models (LLMs)-ChatGPT-3.5, ChatGPT-4.0, Leny-ai, and MediSearch-in diagnosing and classifying Dry Eye Disease (DED), compared to clinician judgment and Dry Eye Workshop-II (DEWS-II) criteria.</p><p><strong>Methods: </strong>A standardized prompt incorporating retrospective clinical and symptomatic data from patients with suspected DED referred to a dry eye clinic was developed. LLMs were evaluated for diagnosis (DED vs. no DED) and classification (aqueous-deficient, evaporative, mixed-component). Agreement was assessed using Cohen's-kappa (Cκ) and Fleiss'-kappa (Fκ). Balanced accuracy, sensitivity, specificity, and F1 score were calculated.</p><p><strong>Results: </strong>Among 338 patients (78.6 % female, mean age 53.2 years), clinicians diagnosed DED in 300, and DEWS-II criteria identified 234. LLMs showed high agreement with clinicians for DED diagnosis (93 %-99 %, Cκ: 0.81-0.86). Subtype agreement was lower (aqueous-deficient: 0 %-18 %, evaporative: 4 %-80 %, mixed-component: 22 %-92 %; Fκ: -0.20 to -0.10). Diagnostic balanced accuracy was 48 %-56 %, with high sensitivity (93 %-99 %) but low specificity (0 %-16 %). Subtype balanced accuracy and F1 score ranged from 33 %-81 % 0 %-71 %, respectively. Compared to DEWS-II, agreement for DED diagnosis remained high (96 %-99 %) but with weaker Cκ (0.52-0.58). Subtype agreement was again low (aqueous-deficient: 0 %-20 %, evaporative: 9 %-68 %, mixed-component: 16 %-75 %; Fκ: -0.09 to -0.02). Diagnostic balanced accuracy was 49 %-56 %, sensitivity 97 %-99 %, and specificity 5 %-16 %. Subtype balanced accuracy ranged from 43 % to 56 %, F1 score 0-68.</p><p><strong>Conclusion: </strong>LLMs showed strong agreement and high sensitivity for DED diagnosis but limited specificity and poor subtype classification, mirroring clinical challenges and highlighting risks of overdiagnosis.</p>\",\"PeriodicalId\":49087,\"journal\":{\"name\":\"Contact Lens & Anterior Eye\",\"volume\":\" \",\"pages\":\"102509\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Contact Lens & Anterior Eye\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.clae.2025.102509\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Contact Lens & Anterior Eye","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.clae.2025.102509","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:评估四种大型语言模型(LLMs)-ChatGPT-3.5、ChatGPT-4.0、Leny-ai和medisearch -在诊断和分类干眼病(DED)方面的一致性和性能,并将其与临床医生判断和干眼研讨会ii (DEWS-II)标准进行比较。方法:标准化提示纳入回顾性临床和症状资料的患者疑似DED转诊干眼诊所。评估llm的诊断(DED vs.无DED)和分类(缺水、蒸发、混合成分)。采用Cohen's-kappa (Cκ)和Fleiss'-kappa (Fκ)评价一致性。计算平衡的准确性、敏感性、特异性和F1评分。结果:在338例患者中(78.6%为女性,平均年龄53.2岁),临床医生诊断为DED的有300例,DEWS-II标准确诊为234例。LLMs与临床医生对DED诊断的一致性较高(93% ~ 99%,Cκ: 0.81 ~ 0.86)。亚型一致性较低(缺水型:0% ~ 18%,蒸发型:4% ~ 80%,混合组分:22% ~ 92%;Fκ: -0.20 ~ -0.10)。诊断平衡准确率为48% - 56%,灵敏度高(93% - 99%),特异性低(0% - 16%)。亚型平衡准确率和F1评分范围分别为33% ~ 81%、0% ~ 71%。与DEWS-II相比,诊断DED的一致性仍然很高(96% - 99%),但Cκ较弱(0.52-0.58)。亚型一致性也很低(缺水:0% - 20%,蒸发:9% - 68%,混合成分:16% - 75%;Fκ: -0.09至-0.02)。诊断平衡准确率为49% - 56%,灵敏度为97% - 99%,特异性为5% - 16%。亚型平衡准确率范围为43% ~ 56%,F1得分0 ~ 68分。结论:LLMs对DED的诊断一致性强,敏感性高,但特异性有限,亚型分型差,反映了临床挑战,突出了过度诊断的风险。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Diagnostic accuracy in dry eye: Insights into clinical and artificial intelligence limitations: Limitations of diagnostic accuracy in dry eye.

Purpose: To evaluate the agreement and performance of four large language models (LLMs)-ChatGPT-3.5, ChatGPT-4.0, Leny-ai, and MediSearch-in diagnosing and classifying Dry Eye Disease (DED), compared to clinician judgment and Dry Eye Workshop-II (DEWS-II) criteria.

Methods: A standardized prompt incorporating retrospective clinical and symptomatic data from patients with suspected DED referred to a dry eye clinic was developed. LLMs were evaluated for diagnosis (DED vs. no DED) and classification (aqueous-deficient, evaporative, mixed-component). Agreement was assessed using Cohen's-kappa (Cκ) and Fleiss'-kappa (Fκ). Balanced accuracy, sensitivity, specificity, and F1 score were calculated.

Results: Among 338 patients (78.6 % female, mean age 53.2 years), clinicians diagnosed DED in 300, and DEWS-II criteria identified 234. LLMs showed high agreement with clinicians for DED diagnosis (93 %-99 %, Cκ: 0.81-0.86). Subtype agreement was lower (aqueous-deficient: 0 %-18 %, evaporative: 4 %-80 %, mixed-component: 22 %-92 %; Fκ: -0.20 to -0.10). Diagnostic balanced accuracy was 48 %-56 %, with high sensitivity (93 %-99 %) but low specificity (0 %-16 %). Subtype balanced accuracy and F1 score ranged from 33 %-81 % 0 %-71 %, respectively. Compared to DEWS-II, agreement for DED diagnosis remained high (96 %-99 %) but with weaker Cκ (0.52-0.58). Subtype agreement was again low (aqueous-deficient: 0 %-20 %, evaporative: 9 %-68 %, mixed-component: 16 %-75 %; Fκ: -0.09 to -0.02). Diagnostic balanced accuracy was 49 %-56 %, sensitivity 97 %-99 %, and specificity 5 %-16 %. Subtype balanced accuracy ranged from 43 % to 56 %, F1 score 0-68.

Conclusion: LLMs showed strong agreement and high sensitivity for DED diagnosis but limited specificity and poor subtype classification, mirroring clinical challenges and highlighting risks of overdiagnosis.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.60
自引率
18.80%
发文量
198
审稿时长
55 days
期刊介绍: Contact Lens & Anterior Eye is a research-based journal covering all aspects of contact lens theory and practice, including original articles on invention and innovations, as well as the regular features of: Case Reports; Literary Reviews; Editorials; Instrumentation and Techniques and Dates of Professional Meetings.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信