评估聊天生成预训练变压器对有关先天性上肢差异的常见患者问题的反应的准确性

Q3 Medicine
Niklaus P. Zeller BS , Ayush D. Shah BA , Ann E. Van Heest MD , Deborah C. Bohn MD
{"title":"评估聊天生成预训练变压器对有关先天性上肢差异的常见患者问题的反应的准确性","authors":"Niklaus P. Zeller BS ,&nbsp;Ayush D. Shah BA ,&nbsp;Ann E. Van Heest MD ,&nbsp;Deborah C. Bohn MD","doi":"10.1016/j.jhsg.2025.100764","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>The purpose was to assess the ability of Chat Generative Pre-Trained Transformer (ChatGPT) 4.0 to accurately and reliably answer patients’ frequently asked questions (FAQs) about congenital upper limb differences (CULDs) and their treatment options.</div></div><div><h3>Methods</h3><div>Two pediatric hand surgeons were queried regarding FAQs they receive from parents about CULDs. Sixteen FAQs were input to ChatGPT-4.0 for the following conditions: (1) syndactyly, (2) polydactyly, (3) radial longitudinal deficiency, (4) thumb hypoplasia, and (5) general congenital hand differences. Two additional psychosocial care questions were queried, and all responses were graded by the surgeons using a scale of 1–4, based on the quality of the response. Independent chats were used for each question to reduce memory–retention bias with no pretraining of the software application.</div></div><div><h3>Results</h3><div>Overall, ChatGPT provided relatively reliable, evidence-based responses to the 16 queried FAQs. In total, 164 grades were assigned to the 82 ChatGPT responses: 83 (51%) did not require any clarification, 37 (23%) required minimal clarification, 32 (20%) required moderate clarification, and 13 (8%) received an unsatisfactory rating. However, there was considerable variability in the depth of many responses. When queried on medical associations with syndactyly and polydactyly, ChatGPT provided a detailed account of associated syndromes, although there was no mention that syndromic involvement is relatively rare. Furthermore, ChatGPT recommended that the patients consult a health care provider for individualized care 81 times in 49 responses. It commonly “referred” patients to genetic counselors (n = 26, 32%), followed by pediatric orthopedic surgeons and orthopedic surgeons (n = 16, 20%), and hand surgeons (n = 9, 11%).</div></div><div><h3>Conclusions</h3><div>Chat Generative Pre-Trained Transformer provided evidence-based responses not requiring clarification to a majority of FAQs about CULDs. However, there was considerable variation across the responses, and it rarely “referred” patients to hand surgeons. As new tools for patient education, ChatGPT and similar large language models should be approached cautiously when seeking information about CULDs. Responses do not consistently provide comprehensive, individualized information. 8% of responses were misguiding.</div></div><div><h3>Type of study/level of evidence</h3><div>Economic/decision analysis IIC.</div></div>","PeriodicalId":36920,"journal":{"name":"Journal of Hand Surgery Global Online","volume":"7 4","pages":"Article 100764"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessing Accuracy of Chat Generative Pre-Trained Transformer’s Responses to Common Patient Questions Regarding Congenital Upper Limb Differences\",\"authors\":\"Niklaus P. Zeller BS ,&nbsp;Ayush D. Shah BA ,&nbsp;Ann E. Van Heest MD ,&nbsp;Deborah C. Bohn MD\",\"doi\":\"10.1016/j.jhsg.2025.100764\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><div>The purpose was to assess the ability of Chat Generative Pre-Trained Transformer (ChatGPT) 4.0 to accurately and reliably answer patients’ frequently asked questions (FAQs) about congenital upper limb differences (CULDs) and their treatment options.</div></div><div><h3>Methods</h3><div>Two pediatric hand surgeons were queried regarding FAQs they receive from parents about CULDs. Sixteen FAQs were input to ChatGPT-4.0 for the following conditions: (1) syndactyly, (2) polydactyly, (3) radial longitudinal deficiency, (4) thumb hypoplasia, and (5) general congenital hand differences. Two additional psychosocial care questions were queried, and all responses were graded by the surgeons using a scale of 1–4, based on the quality of the response. Independent chats were used for each question to reduce memory–retention bias with no pretraining of the software application.</div></div><div><h3>Results</h3><div>Overall, ChatGPT provided relatively reliable, evidence-based responses to the 16 queried FAQs. In total, 164 grades were assigned to the 82 ChatGPT responses: 83 (51%) did not require any clarification, 37 (23%) required minimal clarification, 32 (20%) required moderate clarification, and 13 (8%) received an unsatisfactory rating. However, there was considerable variability in the depth of many responses. When queried on medical associations with syndactyly and polydactyly, ChatGPT provided a detailed account of associated syndromes, although there was no mention that syndromic involvement is relatively rare. Furthermore, ChatGPT recommended that the patients consult a health care provider for individualized care 81 times in 49 responses. It commonly “referred” patients to genetic counselors (n = 26, 32%), followed by pediatric orthopedic surgeons and orthopedic surgeons (n = 16, 20%), and hand surgeons (n = 9, 11%).</div></div><div><h3>Conclusions</h3><div>Chat Generative Pre-Trained Transformer provided evidence-based responses not requiring clarification to a majority of FAQs about CULDs. However, there was considerable variation across the responses, and it rarely “referred” patients to hand surgeons. As new tools for patient education, ChatGPT and similar large language models should be approached cautiously when seeking information about CULDs. Responses do not consistently provide comprehensive, individualized information. 8% of responses were misguiding.</div></div><div><h3>Type of study/level of evidence</h3><div>Economic/decision analysis IIC.</div></div>\",\"PeriodicalId\":36920,\"journal\":{\"name\":\"Journal of Hand Surgery Global Online\",\"volume\":\"7 4\",\"pages\":\"Article 100764\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hand Surgery Global Online\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2589514125000842\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hand Surgery Global Online","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2589514125000842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

目的评估聊天生成预训练转换器(ChatGPT) 4.0准确可靠地回答患者关于先天性上肢差异(CULDs)及其治疗方案的常见问题(FAQs)的能力。方法对两名儿科手外科医生进行问卷调查,询问他们从家长那里收到的关于CULDs的常见问题。针对以下情况,在ChatGPT-4.0中输入16个常见问题:(1)并指畸形,(2)多指畸形,(3)径向纵向缺陷,(4)拇指发育不全,(5)一般先天性手部差异。另外还询问了两个心理社会护理问题,所有的回答都由外科医生根据回答的质量用1-4分进行评分。每个问题都使用独立聊天,以减少记忆保留偏差,而不需要对软件应用程序进行预训练。结果总体而言,ChatGPT对16个查询的常见问题提供了相对可靠的、基于证据的回答。总共有164个等级被分配给82个ChatGPT回答:83个(51%)不需要任何澄清,37个(23%)需要最低限度的澄清,32个(20%)需要适度的澄清,13个(8%)得到了不满意的评级。然而,许多答复的深度有相当大的差异。当被问及与并指和多指的医学关联时,ChatGPT提供了相关综合征的详细描述,尽管没有提到综合征累及相对罕见。此外,ChatGPT在49份回复中建议患者咨询医疗保健提供者81次进行个性化护理。它通常将患者“推荐”给遗传咨询师(n = 26.32%),其次是儿科骨科医生和骨科医生(n = 16.20%),手外科医生(n = 9.11%)。结论schat生成预训练变压器提供了基于证据的响应,无需对大多数关于culd的常见问题进行澄清。然而,在不同的反应中存在相当大的差异,并且很少将患者“转介”给手外科医生。ChatGPT和类似的大型语言模型作为患者教育的新工具,在寻求CULDs信息时应谨慎对待。答复不能始终如一地提供全面、个性化的信息。8%的回答具有误导性。研究类型/证据水平经济/决策分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Assessing Accuracy of Chat Generative Pre-Trained Transformer’s Responses to Common Patient Questions Regarding Congenital Upper Limb Differences

Purpose

The purpose was to assess the ability of Chat Generative Pre-Trained Transformer (ChatGPT) 4.0 to accurately and reliably answer patients’ frequently asked questions (FAQs) about congenital upper limb differences (CULDs) and their treatment options.

Methods

Two pediatric hand surgeons were queried regarding FAQs they receive from parents about CULDs. Sixteen FAQs were input to ChatGPT-4.0 for the following conditions: (1) syndactyly, (2) polydactyly, (3) radial longitudinal deficiency, (4) thumb hypoplasia, and (5) general congenital hand differences. Two additional psychosocial care questions were queried, and all responses were graded by the surgeons using a scale of 1–4, based on the quality of the response. Independent chats were used for each question to reduce memory–retention bias with no pretraining of the software application.

Results

Overall, ChatGPT provided relatively reliable, evidence-based responses to the 16 queried FAQs. In total, 164 grades were assigned to the 82 ChatGPT responses: 83 (51%) did not require any clarification, 37 (23%) required minimal clarification, 32 (20%) required moderate clarification, and 13 (8%) received an unsatisfactory rating. However, there was considerable variability in the depth of many responses. When queried on medical associations with syndactyly and polydactyly, ChatGPT provided a detailed account of associated syndromes, although there was no mention that syndromic involvement is relatively rare. Furthermore, ChatGPT recommended that the patients consult a health care provider for individualized care 81 times in 49 responses. It commonly “referred” patients to genetic counselors (n = 26, 32%), followed by pediatric orthopedic surgeons and orthopedic surgeons (n = 16, 20%), and hand surgeons (n = 9, 11%).

Conclusions

Chat Generative Pre-Trained Transformer provided evidence-based responses not requiring clarification to a majority of FAQs about CULDs. However, there was considerable variation across the responses, and it rarely “referred” patients to hand surgeons. As new tools for patient education, ChatGPT and similar large language models should be approached cautiously when seeking information about CULDs. Responses do not consistently provide comprehensive, individualized information. 8% of responses were misguiding.

Type of study/level of evidence

Economic/decision analysis IIC.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.10
自引率
0.00%
发文量
111
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信