Sowmya V Kothandan, Stephanie L Watson, Sayan Basu, Swati Singh
{"title":"大型语言模型在干眼病中的应用:困惑AI与ChatGPT4。","authors":"Sowmya V Kothandan, Stephanie L Watson, Sayan Basu, Swati Singh","doi":"10.1080/08820538.2025.2547077","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>To compare the utility of two large language models (LLM) in dry eye disease (DED) clinics and research.</p><p><strong>Methods: </strong>Trained ocular surface experts generated 12 prompts for queries commonly asked by DED patients (<i>n</i> = 10 questions) and research ideas for DED (<i>n</i> = 2). Responses of two LLM models, ChatGPT4 and Perplexity AI, were graded by them using a standardized grading system (1 = needs improvement, 2 = fair, 3 = good, and 4 = excellent) evaluating the response accuracy, compassion, comprehensiveness, professionalism, humanness, and overall quality of each response. The mean scores of the grades from each expert for each response were compared.</p><p><strong>Results: </strong>The 10 clinical DED prompts received similar overall mean quality grades for the responses with ChatGPT (mean grade score = 2.6) and Perplexity AI (2.7). The mean grade scores for the response characteristics (accuracy, compassion, professionalism, humanness, and succinctness) varied between the experts for each question (range 2.2 to 3.1 for ChatGPT and 2.3 to 3.0 for Perplexity AI). ChatGPT4 generated DED-related research ideas better than Perplexity AI (mean 3.4 vs. 2.6). The source citations for responses by Perplexity AI were from web pages and were not evidence-based. There was slight or poor agreement between the reviewers' ratings for response characteristics generated by both LLMs.</p><p><strong>Conclusion: </strong>Perplexity AI and ChatGPT performed similarly for patient-related queries on DED and could have a role in patient education. These LLMs could have a role in DED clinics for patient counseling but require supervision. The LLMs are not ready to generate dry-eye research ideas or perform literature searches for DED.</p>","PeriodicalId":21702,"journal":{"name":"Seminars in Ophthalmology","volume":" ","pages":"1-6"},"PeriodicalIF":2.3000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large Language Models Use in Dry Eye Disease: Perplexity AI versus ChatGPT4.\",\"authors\":\"Sowmya V Kothandan, Stephanie L Watson, Sayan Basu, Swati Singh\",\"doi\":\"10.1080/08820538.2025.2547077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>To compare the utility of two large language models (LLM) in dry eye disease (DED) clinics and research.</p><p><strong>Methods: </strong>Trained ocular surface experts generated 12 prompts for queries commonly asked by DED patients (<i>n</i> = 10 questions) and research ideas for DED (<i>n</i> = 2). Responses of two LLM models, ChatGPT4 and Perplexity AI, were graded by them using a standardized grading system (1 = needs improvement, 2 = fair, 3 = good, and 4 = excellent) evaluating the response accuracy, compassion, comprehensiveness, professionalism, humanness, and overall quality of each response. The mean scores of the grades from each expert for each response were compared.</p><p><strong>Results: </strong>The 10 clinical DED prompts received similar overall mean quality grades for the responses with ChatGPT (mean grade score = 2.6) and Perplexity AI (2.7). The mean grade scores for the response characteristics (accuracy, compassion, professionalism, humanness, and succinctness) varied between the experts for each question (range 2.2 to 3.1 for ChatGPT and 2.3 to 3.0 for Perplexity AI). ChatGPT4 generated DED-related research ideas better than Perplexity AI (mean 3.4 vs. 2.6). The source citations for responses by Perplexity AI were from web pages and were not evidence-based. There was slight or poor agreement between the reviewers' ratings for response characteristics generated by both LLMs.</p><p><strong>Conclusion: </strong>Perplexity AI and ChatGPT performed similarly for patient-related queries on DED and could have a role in patient education. These LLMs could have a role in DED clinics for patient counseling but require supervision. The LLMs are not ready to generate dry-eye research ideas or perform literature searches for DED.</p>\",\"PeriodicalId\":21702,\"journal\":{\"name\":\"Seminars in Ophthalmology\",\"volume\":\" \",\"pages\":\"1-6\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seminars in Ophthalmology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1080/08820538.2025.2547077\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"OPHTHALMOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seminars in Ophthalmology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/08820538.2025.2547077","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"OPHTHALMOLOGY","Score":null,"Total":0}
Large Language Models Use in Dry Eye Disease: Perplexity AI versus ChatGPT4.
Purpose: To compare the utility of two large language models (LLM) in dry eye disease (DED) clinics and research.
Methods: Trained ocular surface experts generated 12 prompts for queries commonly asked by DED patients (n = 10 questions) and research ideas for DED (n = 2). Responses of two LLM models, ChatGPT4 and Perplexity AI, were graded by them using a standardized grading system (1 = needs improvement, 2 = fair, 3 = good, and 4 = excellent) evaluating the response accuracy, compassion, comprehensiveness, professionalism, humanness, and overall quality of each response. The mean scores of the grades from each expert for each response were compared.
Results: The 10 clinical DED prompts received similar overall mean quality grades for the responses with ChatGPT (mean grade score = 2.6) and Perplexity AI (2.7). The mean grade scores for the response characteristics (accuracy, compassion, professionalism, humanness, and succinctness) varied between the experts for each question (range 2.2 to 3.1 for ChatGPT and 2.3 to 3.0 for Perplexity AI). ChatGPT4 generated DED-related research ideas better than Perplexity AI (mean 3.4 vs. 2.6). The source citations for responses by Perplexity AI were from web pages and were not evidence-based. There was slight or poor agreement between the reviewers' ratings for response characteristics generated by both LLMs.
Conclusion: Perplexity AI and ChatGPT performed similarly for patient-related queries on DED and could have a role in patient education. These LLMs could have a role in DED clinics for patient counseling but require supervision. The LLMs are not ready to generate dry-eye research ideas or perform literature searches for DED.
期刊介绍:
Seminars in Ophthalmology offers current, clinically oriented reviews on the diagnosis and treatment of ophthalmic disorders. Each issue focuses on a single topic, with a primary emphasis on appropriate surgical techniques.