使用ChatGPT生成泌尿妇科手术知情同意。

IF 0.8 Q4 OBSTETRICS & GYNECOLOGY
Emily S Johnson, Eva K Welch, Jacqueline Kikuchi, Heather Barbier, Christine M Vaccaro, Felicia Balzano, Katherine L Dengler
{"title":"使用ChatGPT生成泌尿妇科手术知情同意。","authors":"Emily S Johnson, Eva K Welch, Jacqueline Kikuchi, Heather Barbier, Christine M Vaccaro, Felicia Balzano, Katherine L Dengler","doi":"10.1097/SPV.0000000000001638","DOIUrl":null,"url":null,"abstract":"<p><strong>Importance: </strong>Use of the publicly available Large Language Model, Chat Generative Pre-trained Transformer (ChatGPT 3.5; OpenAI, 2022), is growing in health care despite varying accuracies.</p><p><strong>Objective: </strong>The aim of this study was to assess the accuracy and readability of ChatGPT's responses to questions encompassing surgical informed consent in urogynecology.</p><p><strong>Study design: </strong>Five fellowship-trained urogynecology attending physicians and 1 reconstructive female urologist evaluated ChatGPT's responses to questions about 4 surgical procedures: (1) retropubic midurethral sling, (2) total vaginal hysterectomy, (3) uterosacral ligament suspension, and (4) sacrocolpopexy. Questions involved procedure descriptions, risks/benefits/alternatives, and additional resources. Responses were rated using the DISCERN tool, a 4-point accuracy scale, and the Flesch-Kinkaid Grade Level score.</p><p><strong>Results: </strong>The median DISCERN tool overall rating was 3 (interquartile range [IQR], 3-4), indicating a moderate rating (\"potentially important but not serious shortcomings\"). Retropubic midurethral sling received the highest overall score (median, 4; IQR, 3-4), and uterosacral ligament suspension received the lowest (median, 3; IQR, 3-3). Using the 4-point accuracy scale, 44.0% of responses received a score of 4 (\"correct and adequate\"), 22.6% received a score of 3 (\"correct but insufficient\"), 29.8% received a score of 2 (\"accurate and misleading information together\"), and 3.6% received a score of 1 (\"wrong or irrelevant answer\"). ChatGPT performance was poor for discussion of benefits and alternatives for all surgical procedures, with some responses being inaccurate. The mean Flesch-Kinkaid Grade Level score for all responses was 17.5 (SD, 2.1), corresponding to a postgraduate reading level.</p><p><strong>Conclusions: </strong>Overall, ChatGPT generated accurate responses to questions about surgical informed consent. However, it produced clearly false portions of responses, highlighting the need for a careful review of responses by qualified health care professionals.</p>","PeriodicalId":75288,"journal":{"name":"Urogynecology (Hagerstown, Md.)","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Use of ChatGPT to Generate Informed Consent for Surgery in Urogynecology.\",\"authors\":\"Emily S Johnson, Eva K Welch, Jacqueline Kikuchi, Heather Barbier, Christine M Vaccaro, Felicia Balzano, Katherine L Dengler\",\"doi\":\"10.1097/SPV.0000000000001638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Importance: </strong>Use of the publicly available Large Language Model, Chat Generative Pre-trained Transformer (ChatGPT 3.5; OpenAI, 2022), is growing in health care despite varying accuracies.</p><p><strong>Objective: </strong>The aim of this study was to assess the accuracy and readability of ChatGPT's responses to questions encompassing surgical informed consent in urogynecology.</p><p><strong>Study design: </strong>Five fellowship-trained urogynecology attending physicians and 1 reconstructive female urologist evaluated ChatGPT's responses to questions about 4 surgical procedures: (1) retropubic midurethral sling, (2) total vaginal hysterectomy, (3) uterosacral ligament suspension, and (4) sacrocolpopexy. Questions involved procedure descriptions, risks/benefits/alternatives, and additional resources. Responses were rated using the DISCERN tool, a 4-point accuracy scale, and the Flesch-Kinkaid Grade Level score.</p><p><strong>Results: </strong>The median DISCERN tool overall rating was 3 (interquartile range [IQR], 3-4), indicating a moderate rating (\\\"potentially important but not serious shortcomings\\\"). Retropubic midurethral sling received the highest overall score (median, 4; IQR, 3-4), and uterosacral ligament suspension received the lowest (median, 3; IQR, 3-3). Using the 4-point accuracy scale, 44.0% of responses received a score of 4 (\\\"correct and adequate\\\"), 22.6% received a score of 3 (\\\"correct but insufficient\\\"), 29.8% received a score of 2 (\\\"accurate and misleading information together\\\"), and 3.6% received a score of 1 (\\\"wrong or irrelevant answer\\\"). ChatGPT performance was poor for discussion of benefits and alternatives for all surgical procedures, with some responses being inaccurate. The mean Flesch-Kinkaid Grade Level score for all responses was 17.5 (SD, 2.1), corresponding to a postgraduate reading level.</p><p><strong>Conclusions: </strong>Overall, ChatGPT generated accurate responses to questions about surgical informed consent. However, it produced clearly false portions of responses, highlighting the need for a careful review of responses by qualified health care professionals.</p>\",\"PeriodicalId\":75288,\"journal\":{\"name\":\"Urogynecology (Hagerstown, Md.)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2025-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Urogynecology (Hagerstown, Md.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1097/SPV.0000000000001638\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OBSTETRICS & GYNECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Urogynecology (Hagerstown, Md.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1097/SPV.0000000000001638","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

重要性:使用公开可用的大型语言模型,聊天生成预训练转换器(ChatGPT 3.5;OpenAI, 2022年),在医疗保健领域不断发展,尽管准确性不一。目的:本研究的目的是评估ChatGPT对泌尿妇科手术知情同意问题的回答的准确性和可读性。研究设计:5名接受过奖学金培训的泌尿妇科主治医生和1名女性泌尿外科医生评估了ChatGPT对4种手术方法的回答:(1)耻骨后尿道中悬吊术,(2)阴道全子宫切除术,(3)子宫骶韧带悬吊术,(4)骶colpop固定术。问题涉及程序描述、风险/收益/替代方案和其他资源。使用DISCERN工具、4分准确度量表和Flesch-Kinkaid Grade Level分数对回答进行评分。结果:DISCERN工具总体评分中位数为3(四分位数范围[IQR], 3-4),表明评级中等(“潜在重要但不严重的缺点”)。耻骨后尿道中悬吊总分最高(中位数,4分;IQR, 3-4),子宫骶韧带悬吊的评分最低(中位数,3;差,3 - 3)。使用4点准确度量表,44.0%的回答得到4分(“正确和充分”),22.6%得到3分(“正确但不充分”),29.8%得到2分(“准确和误导性信息”),3.6%得到1分(“错误或不相关的答案”)。ChatGPT在讨论所有外科手术的益处和替代方案时表现不佳,有些反应不准确。所有回答的Flesch-Kinkaid Grade Level平均得分为17.5 (SD, 2.1),相当于研究生的阅读水平。结论:总体而言,ChatGPT对手术知情同意的问题给出了准确的回答。然而,它提出了答复中明显错误的部分,突出表明有必要由合格的保健专业人员仔细审查答复。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Use of ChatGPT to Generate Informed Consent for Surgery in Urogynecology.

Importance: Use of the publicly available Large Language Model, Chat Generative Pre-trained Transformer (ChatGPT 3.5; OpenAI, 2022), is growing in health care despite varying accuracies.

Objective: The aim of this study was to assess the accuracy and readability of ChatGPT's responses to questions encompassing surgical informed consent in urogynecology.

Study design: Five fellowship-trained urogynecology attending physicians and 1 reconstructive female urologist evaluated ChatGPT's responses to questions about 4 surgical procedures: (1) retropubic midurethral sling, (2) total vaginal hysterectomy, (3) uterosacral ligament suspension, and (4) sacrocolpopexy. Questions involved procedure descriptions, risks/benefits/alternatives, and additional resources. Responses were rated using the DISCERN tool, a 4-point accuracy scale, and the Flesch-Kinkaid Grade Level score.

Results: The median DISCERN tool overall rating was 3 (interquartile range [IQR], 3-4), indicating a moderate rating ("potentially important but not serious shortcomings"). Retropubic midurethral sling received the highest overall score (median, 4; IQR, 3-4), and uterosacral ligament suspension received the lowest (median, 3; IQR, 3-3). Using the 4-point accuracy scale, 44.0% of responses received a score of 4 ("correct and adequate"), 22.6% received a score of 3 ("correct but insufficient"), 29.8% received a score of 2 ("accurate and misleading information together"), and 3.6% received a score of 1 ("wrong or irrelevant answer"). ChatGPT performance was poor for discussion of benefits and alternatives for all surgical procedures, with some responses being inaccurate. The mean Flesch-Kinkaid Grade Level score for all responses was 17.5 (SD, 2.1), corresponding to a postgraduate reading level.

Conclusions: Overall, ChatGPT generated accurate responses to questions about surgical informed consent. However, it produced clearly false portions of responses, highlighting the need for a careful review of responses by qualified health care professionals.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信