使用ChatGPT生成泌尿妇科手术知情同意。

IF 1.2 Q4 OBSTETRICS & GYNECOLOGY

Urogynecology (Hagerstown, Md.) Pub Date : 2025-03-01 Epub Date: 2025-01-17 DOI:10.1097/SPV.0000000000001638

Emily S Johnson, Eva K Welch, Jacqueline Kikuchi, Heather Barbier, Christine M Vaccaro, Felicia Balzano, Katherine L Dengler

{"title":"使用ChatGPT生成泌尿妇科手术知情同意。","authors":"Emily S Johnson, Eva K Welch, Jacqueline Kikuchi, Heather Barbier, Christine M Vaccaro, Felicia Balzano, Katherine L Dengler","doi":"10.1097/SPV.0000000000001638","DOIUrl":null,"url":null,"abstract":"Importance: Use of the publicly available Large Language Model, Chat Generative Pre-trained Transformer (ChatGPT 3.5; OpenAI, 2022), is growing in health care despite varying accuracies.Objective: The aim of this study was to assess the accuracy and readability of ChatGPT's responses to questions encompassing surgical informed consent in urogynecology.Study design: Five fellowship-trained urogynecology attending physicians and 1 reconstructive female urologist evaluated ChatGPT's responses to questions about 4 surgical procedures: (1) retropubic midurethral sling, (2) total vaginal hysterectomy, (3) uterosacral ligament suspension, and (4) sacrocolpopexy. Questions involved procedure descriptions, risks/benefits/alternatives, and additional resources. Responses were rated using the DISCERN tool, a 4-point accuracy scale, and the Flesch-Kinkaid Grade Level score.Results: The median DISCERN tool overall rating was 3 (interquartile range [IQR], 3-4), indicating a moderate rating (\"potentially important but not serious shortcomings\"). Retropubic midurethral sling received the highest overall score (median, 4; IQR, 3-4), and uterosacral ligament suspension received the lowest (median, 3; IQR, 3-3). Using the 4-point accuracy scale, 44.0% of responses received a score of 4 (\"correct and adequate\"), 22.6% received a score of 3 (\"correct but insufficient\"), 29.8% received a score of 2 (\"accurate and misleading information together\"), and 3.6% received a score of 1 (\"wrong or irrelevant answer\"). ChatGPT performance was poor for discussion of benefits and alternatives for all surgical procedures, with some responses being inaccurate. The mean Flesch-Kinkaid Grade Level score for all responses was 17.5 (SD, 2.1), corresponding to a postgraduate reading level.Conclusions: Overall, ChatGPT generated accurate responses to questions about surgical informed consent. However, it produced clearly false portions of responses, highlighting the need for a careful review of responses by qualified health care professionals.","PeriodicalId":75288,"journal":{"name":"Urogynecology (Hagerstown, Md.)","volume":" ","pages":"285-291"},"PeriodicalIF":1.2000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Use of ChatGPT to Generate Informed Consent for Surgery in Urogynecology.\",\"authors\":\"Emily S Johnson, Eva K Welch, Jacqueline Kikuchi, Heather Barbier, Christine M Vaccaro, Felicia Balzano, Katherine L Dengler\",\"doi\":\"10.1097/SPV.0000000000001638\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Importance: Use of the publicly available Large Language Model, Chat Generative Pre-trained Transformer (ChatGPT 3.5; OpenAI, 2022), is growing in health care despite varying accuracies.Objective: The aim of this study was to assess the accuracy and readability of ChatGPT's responses to questions encompassing surgical informed consent in urogynecology.Study design: Five fellowship-trained urogynecology attending physicians and 1 reconstructive female urologist evaluated ChatGPT's responses to questions about 4 surgical procedures: (1) retropubic midurethral sling, (2) total vaginal hysterectomy, (3) uterosacral ligament suspension, and (4) sacrocolpopexy. Questions involved procedure descriptions, risks/benefits/alternatives, and additional resources. Responses were rated using the DISCERN tool, a 4-point accuracy scale, and the Flesch-Kinkaid Grade Level score.Results: The median DISCERN tool overall rating was 3 (interquartile range [IQR], 3-4), indicating a moderate rating (\\\"potentially important but not serious shortcomings\\\"). Retropubic midurethral sling received the highest overall score (median, 4; IQR, 3-4), and uterosacral ligament suspension received the lowest (median, 3; IQR, 3-3). Using the 4-point accuracy scale, 44.0% of responses received a score of 4 (\\\"correct and adequate\\\"), 22.6% received a score of 3 (\\\"correct but insufficient\\\"), 29.8% received a score of 2 (\\\"accurate and misleading information together\\\"), and 3.6% received a score of 1 (\\\"wrong or irrelevant answer\\\"). ChatGPT performance was poor for discussion of benefits and alternatives for all surgical procedures, with some responses being inaccurate. The mean Flesch-Kinkaid Grade Level score for all responses was 17.5 (SD, 2.1), corresponding to a postgraduate reading level.Conclusions: Overall, ChatGPT generated accurate responses to questions about surgical informed consent. However, it produced clearly false portions of responses, highlighting the need for a careful review of responses by qualified health care professionals.\",\"PeriodicalId\":75288,\"journal\":{\"name\":\"Urogynecology (Hagerstown, Md.)\",\"volume\":\" \",\"pages\":\"285-291\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Urogynecology (Hagerstown, Md.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1097/SPV.0000000000001638\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q4\",\"JCRName\":\"OBSTETRICS & GYNECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Urogynecology (Hagerstown, Md.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1097/SPV.0000000000001638","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/17 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

重要性：使用公开可用的大型语言模型，聊天生成预训练转换器(ChatGPT 3.5；OpenAI， 2022年)，在医疗保健领域不断发展，尽管准确性不一。目的：本研究的目的是评估ChatGPT对泌尿妇科手术知情同意问题的回答的准确性和可读性。研究设计：5名接受过奖学金培训的泌尿妇科主治医生和1名女性泌尿外科医生评估了ChatGPT对4种手术方法的回答：(1)耻骨后尿道中悬吊术，(2)阴道全子宫切除术，(3)子宫骶韧带悬吊术，(4)骶colpop固定术。问题涉及程序描述、风险/收益/替代方案和其他资源。使用DISCERN工具、4分准确度量表和Flesch-Kinkaid Grade Level分数对回答进行评分。结果：DISCERN工具总体评分中位数为3（四分位数范围[IQR]， 3-4），表明评级中等（“潜在重要但不严重的缺点”）。耻骨后尿道中悬吊总分最高(中位数，4分；IQR, 3-4)，子宫骶韧带悬吊的评分最低(中位数，3；差,3 - 3)。使用4点准确度量表，44.0%的回答得到4分（“正确和充分”），22.6%得到3分（“正确但不充分”），29.8%得到2分（“准确和误导性信息”），3.6%得到1分（“错误或不相关的答案”）。ChatGPT在讨论所有外科手术的益处和替代方案时表现不佳，有些反应不准确。所有回答的Flesch-Kinkaid Grade Level平均得分为17.5 (SD, 2.1)，相当于研究生的阅读水平。结论：总体而言，ChatGPT对手术知情同意的问题给出了准确的回答。然而，它提出了答复中明显错误的部分，突出表明有必要由合格的保健专业人员仔细审查答复。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Use of ChatGPT to Generate Informed Consent for Surgery in Urogynecology.

Importance: Use of the publicly available Large Language Model, Chat Generative Pre-trained Transformer (ChatGPT 3.5; OpenAI, 2022), is growing in health care despite varying accuracies.

Objective: The aim of this study was to assess the accuracy and readability of ChatGPT's responses to questions encompassing surgical informed consent in urogynecology.

Study design: Five fellowship-trained urogynecology attending physicians and 1 reconstructive female urologist evaluated ChatGPT's responses to questions about 4 surgical procedures: (1) retropubic midurethral sling, (2) total vaginal hysterectomy, (3) uterosacral ligament suspension, and (4) sacrocolpopexy. Questions involved procedure descriptions, risks/benefits/alternatives, and additional resources. Responses were rated using the DISCERN tool, a 4-point accuracy scale, and the Flesch-Kinkaid Grade Level score.

Results: The median DISCERN tool overall rating was 3 (interquartile range [IQR], 3-4), indicating a moderate rating ("potentially important but not serious shortcomings"). Retropubic midurethral sling received the highest overall score (median, 4; IQR, 3-4), and uterosacral ligament suspension received the lowest (median, 3; IQR, 3-3). Using the 4-point accuracy scale, 44.0% of responses received a score of 4 ("correct and adequate"), 22.6% received a score of 3 ("correct but insufficient"), 29.8% received a score of 2 ("accurate and misleading information together"), and 3.6% received a score of 1 ("wrong or irrelevant answer"). ChatGPT performance was poor for discussion of benefits and alternatives for all surgical procedures, with some responses being inaccurate. The mean Flesch-Kinkaid Grade Level score for all responses was 17.5 (SD, 2.1), corresponding to a postgraduate reading level.

Conclusions: Overall, ChatGPT generated accurate responses to questions about surgical informed consent. However, it produced clearly false portions of responses, highlighting the need for a careful review of responses by qualified health care professionals.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Urogynecology (Hagerstown, Md.)

CiteScore

2.80

自引率

0.00%

发文量