chatgpt - 40、Claude 3.5 Sonnet、谷歌Gemini 2.0 Flash作为上睑成形术患者教育资源的评价

IF 1 4区 医学 Q3 SURGERY
Suleyman Demir, İsmail Cem Türkeş
{"title":"chatgpt - 40、Claude 3.5 Sonnet、谷歌Gemini 2.0 Flash作为上睑成形术患者教育资源的评价","authors":"Suleyman Demir, İsmail Cem Türkeş","doi":"10.1097/SCS.0000000000011608","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>This study aimed to evaluate the effectiveness, accuracy, and readability of the leading large language models (LLMs) from 3 different companies, ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash, as patient education resources for upper blepharoplasty.</p><p><strong>Methods: </strong>Twenty frequently asked questions about upper blepharoplasty were posed to the 3 LLMs. Two ophthalmologists recorded the responses to the questions and independently evaluated the accuracy of the LLMs using a 5-point Likert scale with scores ranging from 1 to 5. The readability of the analyzed texts was assessed using the SMOG index and the Coleman-Liau index.</p><p><strong>Results: </strong>All models demonstrated high accuracy, with mean Likert scores exceeding 4.5. No statistically significant difference in Likert scores was observed among the 3 models (P=0.097). Claude 3.5 Sonnet generated the most complex responses (Coleman-Liau index: 17.34; SMOG index: 23.82 points), whereas Google Gemini 2.0 Flash produced the most comprehensible texts (Coleman-Liau index: 13.27; SMOG index: 15.04 points).</p><p><strong>Conclusion: </strong>Large language models hold great promise as tools to educate patients about upper blepharoplasty. Future research should focus on simplifying language models without compromising accuracy, keeping models up-to-date, and minimizing bias to improve patient care and safety.</p>","PeriodicalId":15462,"journal":{"name":"Journal of Craniofacial Surgery","volume":" ","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash as Patient Education Resources for Upper Blepharoplasty Patients.\",\"authors\":\"Suleyman Demir, İsmail Cem Türkeş\",\"doi\":\"10.1097/SCS.0000000000011608\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>This study aimed to evaluate the effectiveness, accuracy, and readability of the leading large language models (LLMs) from 3 different companies, ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash, as patient education resources for upper blepharoplasty.</p><p><strong>Methods: </strong>Twenty frequently asked questions about upper blepharoplasty were posed to the 3 LLMs. Two ophthalmologists recorded the responses to the questions and independently evaluated the accuracy of the LLMs using a 5-point Likert scale with scores ranging from 1 to 5. The readability of the analyzed texts was assessed using the SMOG index and the Coleman-Liau index.</p><p><strong>Results: </strong>All models demonstrated high accuracy, with mean Likert scores exceeding 4.5. No statistically significant difference in Likert scores was observed among the 3 models (P=0.097). Claude 3.5 Sonnet generated the most complex responses (Coleman-Liau index: 17.34; SMOG index: 23.82 points), whereas Google Gemini 2.0 Flash produced the most comprehensible texts (Coleman-Liau index: 13.27; SMOG index: 15.04 points).</p><p><strong>Conclusion: </strong>Large language models hold great promise as tools to educate patients about upper blepharoplasty. Future research should focus on simplifying language models without compromising accuracy, keeping models up-to-date, and minimizing bias to improve patient care and safety.</p>\",\"PeriodicalId\":15462,\"journal\":{\"name\":\"Journal of Craniofacial Surgery\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2025-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Craniofacial Surgery\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/SCS.0000000000011608\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"SURGERY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Craniofacial Surgery","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/SCS.0000000000011608","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"SURGERY","Score":null,"Total":0}
引用次数: 0

摘要

目的:本研究旨在评估chatgpt - 40、Claude 3.5 Sonnet和谷歌Gemini 2.0 Flash这三家不同公司领先的大型语言模型(llm)作为上睑成形术患者教育资源的有效性、准确性和可读性。方法:对3例上睑成形术患者提出20个常见问题。两名眼科医生记录了对问题的回答,并使用李克特5分量表独立评估llm的准确性,得分范围从1到5。使用SMOG指数和Coleman-Liau指数评估分析文本的可读性。结果:所有模型均显示出较高的准确性,平均Likert评分超过4.5。3种模型的Likert评分差异无统计学意义(P=0.097)。克劳德3.5十四行诗产生了最复杂的反应(Coleman-Liau指数:17.34;SMOG指数:23.82点),而谷歌Gemini 2.0 Flash产生了最容易理解的文本(Coleman-Liau指数:13.27;烟雾指数:15.04点)。结论:大型语言模型作为上睑成形术的教育工具具有很大的应用前景。未来的研究应该集中在不影响准确性的情况下简化语言模型,保持模型的更新,并最大限度地减少偏见,以改善患者的护理和安全。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash as Patient Education Resources for Upper Blepharoplasty Patients.

Purpose: This study aimed to evaluate the effectiveness, accuracy, and readability of the leading large language models (LLMs) from 3 different companies, ChatGPT-4o, Claude 3.5 Sonnet, and Google Gemini 2.0 Flash, as patient education resources for upper blepharoplasty.

Methods: Twenty frequently asked questions about upper blepharoplasty were posed to the 3 LLMs. Two ophthalmologists recorded the responses to the questions and independently evaluated the accuracy of the LLMs using a 5-point Likert scale with scores ranging from 1 to 5. The readability of the analyzed texts was assessed using the SMOG index and the Coleman-Liau index.

Results: All models demonstrated high accuracy, with mean Likert scores exceeding 4.5. No statistically significant difference in Likert scores was observed among the 3 models (P=0.097). Claude 3.5 Sonnet generated the most complex responses (Coleman-Liau index: 17.34; SMOG index: 23.82 points), whereas Google Gemini 2.0 Flash produced the most comprehensible texts (Coleman-Liau index: 13.27; SMOG index: 15.04 points).

Conclusion: Large language models hold great promise as tools to educate patients about upper blepharoplasty. Future research should focus on simplifying language models without compromising accuracy, keeping models up-to-date, and minimizing bias to improve patient care and safety.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.70
自引率
11.10%
发文量
968
审稿时长
1.5 months
期刊介绍: ​The Journal of Craniofacial Surgery serves as a forum of communication for all those involved in craniofacial surgery, maxillofacial surgery and pediatric plastic surgery. Coverage ranges from practical aspects of craniofacial surgery to the basic science that underlies surgical practice. The journal publishes original articles, scientific reviews, editorials and invited commentary, abstracts and selected articles from international journals, and occasional international bibliographies in craniofacial surgery.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信