chatgpt - 40、Claude 3.5 Sonnet、谷歌Gemini 2.0 Flash作为上睑成形术患者教育资源的评价

IF 1 4区医学 Q3 SURGERY

Journal of Craniofacial Surgery Pub Date : 2025-07-07 DOI:10.1097/SCS.0000000000011608

Suleyman Demir, İsmail Cem Türkeş

{"title":"chatgpt - 40、Claude 3.5 Sonnet、谷歌Gemini 2.0 Flash作为上睑成形术患者教育资源的评价","authors":"Suleyman Demir, İsmail Cem Türkeş","doi":"10.1097/SCS.0000000000011608","DOIUrl":null,"url":null,"abstract":"Purpose: This study aimed to evaluate the effectiveness, accuracy, and readability of the leading large language models (LLMs) from 3 different companies, ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash, as patient education resources for upper blepharoplasty.Methods: Twenty frequently asked questions about upper blepharoplasty were posed to the 3 LLMs. Two ophthalmologists recorded the responses to the questions and independently evaluated the accuracy of the LLMs using a 5-point Likert scale with scores ranging from 1 to 5. The readability of the analyzed texts was assessed using the SMOG index and the Coleman-Liau index.Results: All models demonstrated high accuracy, with mean Likert scores exceeding 4.5. No statistically significant difference in Likert scores was observed among the 3 models (P=0.097). Claude 3.5 Sonnet generated the most complex responses (Coleman-Liau index: 17.34; SMOG index: 23.82 points), whereas Google Gemini 2.0 Flash produced the most comprehensible texts (Coleman-Liau index: 13.27; SMOG index: 15.04 points).Conclusion: Large language models hold great promise as tools to educate patients about upper blepharoplasty. Future research should focus on simplifying language models without compromising accuracy, keeping models up-to-date, and minimizing bias to improve patient care and safety.","PeriodicalId":15462,"journal":{"name":"Journal of Craniofacial Surgery","volume":" ","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash as Patient Education Resources for Upper Blepharoplasty Patients.\",\"authors\":\"Suleyman Demir, İsmail Cem Türkeş\",\"doi\":\"10.1097/SCS.0000000000011608\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Purpose: This study aimed to evaluate the effectiveness, accuracy, and readability of the leading large language models (LLMs) from 3 different companies, ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash, as patient education resources for upper blepharoplasty.Methods: Twenty frequently asked questions about upper blepharoplasty were posed to the 3 LLMs. Two ophthalmologists recorded the responses to the questions and independently evaluated the accuracy of the LLMs using a 5-point Likert scale with scores ranging from 1 to 5. The readability of the analyzed texts was assessed using the SMOG index and the Coleman-Liau index.Results: All models demonstrated high accuracy, with mean Likert scores exceeding 4.5. No statistically significant difference in Likert scores was observed among the 3 models (P=0.097). Claude 3.5 Sonnet generated the most complex responses (Coleman-Liau index: 17.34; SMOG index: 23.82 points), whereas Google Gemini 2.0 Flash produced the most comprehensible texts (Coleman-Liau index: 13.27; SMOG index: 15.04 points).Conclusion: Large language models hold great promise as tools to educate patients about upper blepharoplasty. Future research should focus on simplifying language models without compromising accuracy, keeping models up-to-date, and minimizing bias to improve patient care and safety.\",\"PeriodicalId\":15462,\"journal\":{\"name\":\"Journal of Craniofacial Surgery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Craniofacial Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/SCS.0000000000011608\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Craniofacial Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/SCS.0000000000011608","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}

引用次数: 0

摘要

目的：本研究旨在评估chatgpt - 40、Claude 3.5 Sonnet和谷歌Gemini 2.0 Flash这三家不同公司领先的大型语言模型（llm）作为上睑成形术患者教育资源的有效性、准确性和可读性。方法：对3例上睑成形术患者提出20个常见问题。两名眼科医生记录了对问题的回答，并使用李克特5分量表独立评估llm的准确性，得分范围从1到5。使用SMOG指数和Coleman-Liau指数评估分析文本的可读性。结果：所有模型均显示出较高的准确性，平均Likert评分超过4.5。3种模型的Likert评分差异无统计学意义（P=0.097）。克劳德3.5十四行诗产生了最复杂的反应(Coleman-Liau指数：17.34；SMOG指数：23.82点)，而谷歌Gemini 2.0 Flash产生了最容易理解的文本(Coleman-Liau指数：13.27；烟雾指数：15.04点)。结论：大型语言模型作为上睑成形术的教育工具具有很大的应用前景。未来的研究应该集中在不影响准确性的情况下简化语言模型，保持模型的更新，并最大限度地减少偏见，以改善患者的护理和安全。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash as Patient Education Resources for Upper Blepharoplasty Patients.

Purpose: This study aimed to evaluate the effectiveness, accuracy, and readability of the leading large language models (LLMs) from 3 different companies, ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash, as patient education resources for upper blepharoplasty.

Methods: Twenty frequently asked questions about upper blepharoplasty were posed to the 3 LLMs. Two ophthalmologists recorded the responses to the questions and independently evaluated the accuracy of the LLMs using a 5-point Likert scale with scores ranging from 1 to 5. The readability of the analyzed texts was assessed using the SMOG index and the Coleman-Liau index.

Results: All models demonstrated high accuracy, with mean Likert scores exceeding 4.5. No statistically significant difference in Likert scores was observed among the 3 models (P=0.097). Claude 3.5 Sonnet generated the most complex responses (Coleman-Liau index: 17.34; SMOG index: 23.82 points), whereas Google Gemini 2.0 Flash produced the most comprehensible texts (Coleman-Liau index: 13.27; SMOG index: 15.04 points).

Conclusion: Large language models hold great promise as tools to educate patients about upper blepharoplasty. Future research should focus on simplifying language models without compromising accuracy, keeping models up-to-date, and minimizing bias to improve patient care and safety.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Craniofacial Surgery 医学-外科

CiteScore

1.70

自引率

11.10%

发文量

968

审稿时长

1.5 months

期刊介绍： The Journal of Craniofacial Surgery serves as a forum of communication for all those involved in craniofacial surgery, maxillofacial surgery and pediatric plastic surgery. Coverage ranges from practical aspects of craniofacial surgery to the basic science that underlies surgical practice. The journal publishes original articles, scientific reviews, editorials and invited commentary, abstracts and selected articles from international journals, and occasional international bibliographies in craniofacial surgery.