人工智能可以回答桡骨远端骨折术后的问题——但患者能理解答案吗?

Q3 Medicine
Rae Tarapore MD , Suhasini Gupta MD , Kenneth R. Means Jr MD , Aviram M. Giladi MD, MS
{"title":"人工智能可以回答桡骨远端骨折术后的问题——但患者能理解答案吗?","authors":"Rae Tarapore MD ,&nbsp;Suhasini Gupta MD ,&nbsp;Kenneth R. Means Jr MD ,&nbsp;Aviram M. Giladi MD, MS","doi":"10.1016/j.jhsg.2025.100822","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>The purpose of this study was to assess the validity, reliability, and readability of responses to common patient questions about postoperative from ChatGPT, Microsoft Copilot, and Google Gemini.</div></div><div><h3>Methods</h3><div>Twenty-seven thoroughly vetted questions regarding distal radius fractures repair surgery were compiled and entered into ChatGPT 4, Gemini, and Copilot. The responses were analyzed for quality, accuracy, and readability using the DISCERN scale, the Journal of the American Medical Association benchmark criteria, Flesch-Kincaid Reading Ease Score, and Flesch-Kincaid Grade Level. Citations provided by Google Gemini and Microsoft Copilot were further categorized by source of reference. Five questions were resubmitted, requesting response simplification. The responses were re-evaluated using the same metrics.</div></div><div><h3>Results</h3><div>All three artificial intelligence platforms produced answers that were considered “good” quality (DISCERN scores &gt;50). Copilot had the highest quality of information (68.3), followed by Gemini (62.9) and ChatGPT (52.9). The information provided by Copilot demonstrated the highest reliability, with a Journal of the American Medical Association benchmark criterion of 3 (of 4) compared with Gemini (1) and ChatGPT (0). All three platforms generated complex texts with Flesch-Kincaid Reading Ease Scores ranging between 35.8 and 41.4 and Flesch-Kincaid Grade Level scores between 10.5 and 12.1, indicating a minimum of high-school graduate reading level required. After simplification, Gemini’s reading level remained unchanged, whereas ChatGPT improved to that of a seventh-grade reading level and Copilot improved to that of an eighth-grade reading level. Copilot had a higher number of references (74) compared with Gemini (36).</div></div><div><h3>Conclusions</h3><div>All three platforms provided safe and reliable answers to postoperative questions about distal radius fractures. High reading levels provided by AI remain the biggest barrier to patient accessibility.</div></div><div><h3>Clinical relevance</h3><div>For the current state of mainstream AI platforms, they are best suited as adjunct tools to support, rather than replace, clinical communication from health care workers.</div></div>","PeriodicalId":36920,"journal":{"name":"Journal of Hand Surgery Global Online","volume":"7 6","pages":"Article 100822"},"PeriodicalIF":0.0000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Artificial Intelligence Can Answer Postoperative Questions About Distal Radius Fractures—But Can Patients Understand the Answers?\",\"authors\":\"Rae Tarapore MD ,&nbsp;Suhasini Gupta MD ,&nbsp;Kenneth R. Means Jr MD ,&nbsp;Aviram M. Giladi MD, MS\",\"doi\":\"10.1016/j.jhsg.2025.100822\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>The purpose of this study was to assess the validity, reliability, and readability of responses to common patient questions about postoperative from ChatGPT, Microsoft Copilot, and Google Gemini.</div></div><div><h3>Methods</h3><div>Twenty-seven thoroughly vetted questions regarding distal radius fractures repair surgery were compiled and entered into ChatGPT 4, Gemini, and Copilot. The responses were analyzed for quality, accuracy, and readability using the DISCERN scale, the Journal of the American Medical Association benchmark criteria, Flesch-Kincaid Reading Ease Score, and Flesch-Kincaid Grade Level. Citations provided by Google Gemini and Microsoft Copilot were further categorized by source of reference. Five questions were resubmitted, requesting response simplification. The responses were re-evaluated using the same metrics.</div></div><div><h3>Results</h3><div>All three artificial intelligence platforms produced answers that were considered “good” quality (DISCERN scores &gt;50). Copilot had the highest quality of information (68.3), followed by Gemini (62.9) and ChatGPT (52.9). The information provided by Copilot demonstrated the highest reliability, with a Journal of the American Medical Association benchmark criterion of 3 (of 4) compared with Gemini (1) and ChatGPT (0). All three platforms generated complex texts with Flesch-Kincaid Reading Ease Scores ranging between 35.8 and 41.4 and Flesch-Kincaid Grade Level scores between 10.5 and 12.1, indicating a minimum of high-school graduate reading level required. After simplification, Gemini’s reading level remained unchanged, whereas ChatGPT improved to that of a seventh-grade reading level and Copilot improved to that of an eighth-grade reading level. Copilot had a higher number of references (74) compared with Gemini (36).</div></div><div><h3>Conclusions</h3><div>All three platforms provided safe and reliable answers to postoperative questions about distal radius fractures. High reading levels provided by AI remain the biggest barrier to patient accessibility.</div></div><div><h3>Clinical relevance</h3><div>For the current state of mainstream AI platforms, they are best suited as adjunct tools to support, rather than replace, clinical communication from health care workers.</div></div>\",\"PeriodicalId\":36920,\"journal\":{\"name\":\"Journal of Hand Surgery Global Online\",\"volume\":\"7 6\",\"pages\":\"Article 100822\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hand Surgery Global Online\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2589514125001422\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hand Surgery Global Online","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589514125001422","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

目的本研究的目的是评估ChatGPT、Microsoft Copilot和谷歌Gemini对术后常见患者问题的回答的效度、信度和可读性。方法对27个经过彻底审核的桡骨远端骨折修复手术问题进行整理,输入ChatGPT 4、Gemini和Copilot。使用DISCERN量表、美国医学协会杂志基准标准、Flesch-Kincaid阅读简易评分和Flesch-Kincaid Grade Level来分析回答的质量、准确性和可读性。谷歌Gemini和Microsoft Copilot提供的引文进一步按参考来源分类。重新提出了五个问题,要求简化答复。使用相同的指标重新评估这些反应。结果所有三个人工智能平台都给出了被认为质量“良好”的答案(DISCERN得分>;50)。副驾驶的信息质量最高(68.3),其次是双子座(62.9)和ChatGPT(52.9)。与Gemini(1)和ChatGPT(0)相比,Copilot提供的信息显示出最高的可靠性,美国医学协会杂志的基准标准为3(4)。这三个平台生成的复杂文本的Flesch-Kincaid阅读难度分数在35.8到41.4之间,Flesch-Kincaid年级水平分数在10.5到12.1之间,这表明需要达到高中毕业生的最低阅读水平。简化后,Gemini的阅读水平保持不变,ChatGPT提高到七年级阅读水平,Copilot提高到八年级阅读水平。副驾驶的参考文献数(74)高于双子星(36)。结论三种平台均为桡骨远端骨折术后问题提供了安全可靠的答案。人工智能提供的高阅读水平仍然是患者可及性的最大障碍。就目前主流人工智能平台的状况而言,它们最适合作为辅助工具来支持,而不是取代卫生保健工作者的临床沟通。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Artificial Intelligence Can Answer Postoperative Questions About Distal Radius Fractures—But Can Patients Understand the Answers?

Purpose

The purpose of this study was to assess the validity, reliability, and readability of responses to common patient questions about postoperative from ChatGPT, Microsoft Copilot, and Google Gemini.

Methods

Twenty-seven thoroughly vetted questions regarding distal radius fractures repair surgery were compiled and entered into ChatGPT 4, Gemini, and Copilot. The responses were analyzed for quality, accuracy, and readability using the DISCERN scale, the Journal of the American Medical Association benchmark criteria, Flesch-Kincaid Reading Ease Score, and Flesch-Kincaid Grade Level. Citations provided by Google Gemini and Microsoft Copilot were further categorized by source of reference. Five questions were resubmitted, requesting response simplification. The responses were re-evaluated using the same metrics.

Results

All three artificial intelligence platforms produced answers that were considered “good” quality (DISCERN scores >50). Copilot had the highest quality of information (68.3), followed by Gemini (62.9) and ChatGPT (52.9). The information provided by Copilot demonstrated the highest reliability, with a Journal of the American Medical Association benchmark criterion of 3 (of 4) compared with Gemini (1) and ChatGPT (0). All three platforms generated complex texts with Flesch-Kincaid Reading Ease Scores ranging between 35.8 and 41.4 and Flesch-Kincaid Grade Level scores between 10.5 and 12.1, indicating a minimum of high-school graduate reading level required. After simplification, Gemini’s reading level remained unchanged, whereas ChatGPT improved to that of a seventh-grade reading level and Copilot improved to that of an eighth-grade reading level. Copilot had a higher number of references (74) compared with Gemini (36).

Conclusions

All three platforms provided safe and reliable answers to postoperative questions about distal radius fractures. High reading levels provided by AI remain the biggest barrier to patient accessibility.

Clinical relevance

For the current state of mainstream AI platforms, they are best suited as adjunct tools to support, rather than replace, clinical communication from health care workers.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.10
自引率
0.00%
发文量
111
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信