Large Language Models Use in Dry Eye Disease: Perplexity AI versus ChatGPT4.

IF 2.3 4区 医学 Q2 OPHTHALMOLOGY
Sowmya V Kothandan, Stephanie L Watson, Sayan Basu, Swati Singh
{"title":"Large Language Models Use in Dry Eye Disease: Perplexity AI versus ChatGPT4.","authors":"Sowmya V Kothandan, Stephanie L Watson, Sayan Basu, Swati Singh","doi":"10.1080/08820538.2025.2547077","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To compare the utility of two large language models (LLM) in dry eye disease (DED) clinics and research.</p><p><strong>Methods: </strong>Trained ocular surface experts generated 12 prompts for queries commonly asked by DED patients (<i>n</i> = 10 questions) and research ideas for DED (<i>n</i> = 2). Responses of two LLM models, ChatGPT4 and Perplexity AI, were graded by them using a standardized grading system (1 = needs improvement, 2 = fair, 3 = good, and 4 = excellent) evaluating the response accuracy, compassion, comprehensiveness, professionalism, humanness, and overall quality of each response. The mean scores of the grades from each expert for each response were compared.</p><p><strong>Results: </strong>The 10 clinical DED prompts received similar overall mean quality grades for the responses with ChatGPT (mean grade score = 2.6) and Perplexity AI (2.7). The mean grade scores for the response characteristics (accuracy, compassion, professionalism, humanness, and succinctness) varied between the experts for each question (range 2.2 to 3.1 for ChatGPT and 2.3 to 3.0 for Perplexity AI). ChatGPT4 generated DED-related research ideas better than Perplexity AI (mean 3.4 vs. 2.6). The source citations for responses by Perplexity AI were from web pages and were not evidence-based. There was slight or poor agreement between the reviewers' ratings for response characteristics generated by both LLMs.</p><p><strong>Conclusion: </strong>Perplexity AI and ChatGPT performed similarly for patient-related queries on DED and could have a role in patient education. These LLMs could have a role in DED clinics for patient counseling but require supervision. The LLMs are not ready to generate dry-eye research ideas or perform literature searches for DED.</p>","PeriodicalId":21702,"journal":{"name":"Seminars in Ophthalmology","volume":" ","pages":"1-6"},"PeriodicalIF":2.3000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seminars in Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/08820538.2025.2547077","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: To compare the utility of two large language models (LLM) in dry eye disease (DED) clinics and research.

Methods: Trained ocular surface experts generated 12 prompts for queries commonly asked by DED patients (n = 10 questions) and research ideas for DED (n = 2). Responses of two LLM models, ChatGPT4 and Perplexity AI, were graded by them using a standardized grading system (1 = needs improvement, 2 = fair, 3 = good, and 4 = excellent) evaluating the response accuracy, compassion, comprehensiveness, professionalism, humanness, and overall quality of each response. The mean scores of the grades from each expert for each response were compared.

Results: The 10 clinical DED prompts received similar overall mean quality grades for the responses with ChatGPT (mean grade score = 2.6) and Perplexity AI (2.7). The mean grade scores for the response characteristics (accuracy, compassion, professionalism, humanness, and succinctness) varied between the experts for each question (range 2.2 to 3.1 for ChatGPT and 2.3 to 3.0 for Perplexity AI). ChatGPT4 generated DED-related research ideas better than Perplexity AI (mean 3.4 vs. 2.6). The source citations for responses by Perplexity AI were from web pages and were not evidence-based. There was slight or poor agreement between the reviewers' ratings for response characteristics generated by both LLMs.

Conclusion: Perplexity AI and ChatGPT performed similarly for patient-related queries on DED and could have a role in patient education. These LLMs could have a role in DED clinics for patient counseling but require supervision. The LLMs are not ready to generate dry-eye research ideas or perform literature searches for DED.

大型语言模型在干眼病中的应用:困惑AI与ChatGPT4。
目的:比较两种大型语言模型(LLM)在干眼病(DED)临床和研究中的应用。方法:训练有素的眼表专家对DED患者常问的问题(n = 10个)和DED的研究思路(n = 2个)生成12个提示。他们对ChatGPT4和Perplexity AI两个LLM模型的回答采用标准化评分体系(1 =需要改进,2 =一般,3 =良好,4 =优秀)对每个回答的准确性、同情心、全面性、专业性、人性化和整体质量进行评分。比较每个专家对每个回答的平均评分。结果:10个临床DED提示在ChatGPT(平均评分为2.6)和Perplexity AI(平均评分为2.7)的应答中获得了相似的总体平均质量等级。每个问题的回答特征(准确性、同情心、专业性、人性化和简洁性)的平均得分在专家之间有所不同(ChatGPT的范围为2.2至3.1,Perplexity AI的范围为2.3至3.0)。ChatGPT4比Perplexity AI更好地产生了与ded相关的研究想法(平均3.4 vs 2.6)。Perplexity AI回答的来源引用来自网页,没有证据。审稿人对两个法学硕士产生的反应特征的评分之间存在轻微或较差的一致性。结论:Perplexity AI和ChatGPT在DED患者相关查询中的表现相似,可以在患者教育中发挥作用。这些法学硕士可以在DED诊所为患者提供咨询,但需要监督。法学硕士还没有准备好产生干眼症研究的想法或执行文献检索DED。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Seminars in Ophthalmology
Seminars in Ophthalmology OPHTHALMOLOGY-
CiteScore
3.20
自引率
0.00%
发文量
80
审稿时长
>12 weeks
期刊介绍: Seminars in Ophthalmology offers current, clinically oriented reviews on the diagnosis and treatment of ophthalmic disorders. Each issue focuses on a single topic, with a primary emphasis on appropriate surgical techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信