医生与机器人评估医生和聊天机器人回答妇科肿瘤临床问题的质量和准确性的试点研究

IF 1.2 Q3 OBSTETRICS & GYNECOLOGY
Mary Katherine Anastasio , Pamela Peters , Jonathan Foote , Alexander Melamed , Susan C. Modesitt , Fernanda Musa , Emma Rossi , Benjamin B. Albright , Laura J. Havrilesky , Haley A. Moss
{"title":"医生与机器人评估医生和聊天机器人回答妇科肿瘤临床问题的质量和准确性的试点研究","authors":"Mary Katherine Anastasio ,&nbsp;Pamela Peters ,&nbsp;Jonathan Foote ,&nbsp;Alexander Melamed ,&nbsp;Susan C. Modesitt ,&nbsp;Fernanda Musa ,&nbsp;Emma Rossi ,&nbsp;Benjamin B. Albright ,&nbsp;Laura J. Havrilesky ,&nbsp;Haley A. Moss","doi":"10.1016/j.gore.2024.101477","DOIUrl":null,"url":null,"abstract":"<div><p>Artificial intelligence (AI) applications to medical care are currently under investigation. We aimed to evaluate and compare the quality and accuracy of physician and chatbot responses to common clinical questions in gynecologic oncology. In this cross-sectional pilot study, ten questions about the knowledge and management of gynecologic cancers were selected. Each question was answered by a recruited gynecologic oncologist, ChatGPT (Generative Pretreated Transformer) AI platform, and Bard by Google AI platform. Five recruited gynecologic oncologists who were blinded to the study design were allowed 15 min to respond to each of two questions. Chatbot responses were generated by inserting the question into a fresh session in September 2023. Qualifiers and language identifying the response source were removed. Three gynecologic oncology providers who were blinded to the response source independently reviewed and rated response quality using a 5-point Likert scale, evaluated each response for accuracy, and selected the best response for each question. Overall, physician responses were judged to be best in 76.7 % of evaluations versus ChatGPT (10.0 %) and Bard (13.3 %; p &lt; 0.001). The average quality of responses was 4.2/5.0 for physicians, 3.0/5.0 for ChatGPT and 2.8/5.0 for Bard (<em>t</em>-test for both and ANOVA p &lt; 0.001). Physicians provided a higher proportion of accurate responses (86.7 %) compared to ChatGPT (60 %) and Bard (43 %; p &lt; 0.001 for both). Physicians provided higher quality responses to gynecologic oncology clinical questions compared to chatbots. Patients should be cautioned against non-validated AI platforms for medical advice; larger studies on the use of AI for medical advice are needed.</p></div>","PeriodicalId":12873,"journal":{"name":"Gynecologic Oncology Reports","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2352578924001565/pdfft?md5=726507797baa7aed0cfa4f99c08eb164&pid=1-s2.0-S2352578924001565-main.pdf","citationCount":"0","resultStr":"{\"title\":\"The doc versus the bot: A pilot study to assess the quality and accuracy of physician and chatbot responses to clinical questions in gynecologic oncology\",\"authors\":\"Mary Katherine Anastasio ,&nbsp;Pamela Peters ,&nbsp;Jonathan Foote ,&nbsp;Alexander Melamed ,&nbsp;Susan C. Modesitt ,&nbsp;Fernanda Musa ,&nbsp;Emma Rossi ,&nbsp;Benjamin B. Albright ,&nbsp;Laura J. Havrilesky ,&nbsp;Haley A. Moss\",\"doi\":\"10.1016/j.gore.2024.101477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Artificial intelligence (AI) applications to medical care are currently under investigation. We aimed to evaluate and compare the quality and accuracy of physician and chatbot responses to common clinical questions in gynecologic oncology. In this cross-sectional pilot study, ten questions about the knowledge and management of gynecologic cancers were selected. Each question was answered by a recruited gynecologic oncologist, ChatGPT (Generative Pretreated Transformer) AI platform, and Bard by Google AI platform. Five recruited gynecologic oncologists who were blinded to the study design were allowed 15 min to respond to each of two questions. Chatbot responses were generated by inserting the question into a fresh session in September 2023. Qualifiers and language identifying the response source were removed. Three gynecologic oncology providers who were blinded to the response source independently reviewed and rated response quality using a 5-point Likert scale, evaluated each response for accuracy, and selected the best response for each question. Overall, physician responses were judged to be best in 76.7 % of evaluations versus ChatGPT (10.0 %) and Bard (13.3 %; p &lt; 0.001). The average quality of responses was 4.2/5.0 for physicians, 3.0/5.0 for ChatGPT and 2.8/5.0 for Bard (<em>t</em>-test for both and ANOVA p &lt; 0.001). Physicians provided a higher proportion of accurate responses (86.7 %) compared to ChatGPT (60 %) and Bard (43 %; p &lt; 0.001 for both). Physicians provided higher quality responses to gynecologic oncology clinical questions compared to chatbots. Patients should be cautioned against non-validated AI platforms for medical advice; larger studies on the use of AI for medical advice are needed.</p></div>\",\"PeriodicalId\":12873,\"journal\":{\"name\":\"Gynecologic Oncology Reports\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2352578924001565/pdfft?md5=726507797baa7aed0cfa4f99c08eb164&pid=1-s2.0-S2352578924001565-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Gynecologic Oncology Reports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352578924001565\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"OBSTETRICS & GYNECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gynecologic Oncology Reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352578924001565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

人工智能(AI)在医疗领域的应用目前正在研究之中。我们的目的是评估和比较医生和聊天机器人对妇科肿瘤学常见临床问题回答的质量和准确性。在这项横断面试点研究中,我们选择了十个有关妇科癌症知识和管理的问题。每个问题分别由一名受聘的妇科肿瘤专家、ChatGPT(生成预处理转换器)人工智能平台和谷歌人工智能平台 Bard 回答。五名受聘的妇科肿瘤专家对研究设计保持盲注,每人有 15 分钟的时间回答两个问题。聊天机器人的回答是在 2023 年 9 月的新会话中插入问题生成的。删除了可识别回复来源的限定词和语言。三位妇科肿瘤提供者对回复来源进行了保密,他们使用 5 点李克特量表对回复质量进行了独立审核和评分,评估了每个回复的准确性,并为每个问题选出了最佳回复。总体而言,在 76.7% 的评估中,医生回答被评为最佳,而 ChatGPT(10.0%)和 Bard(13.3%;p <0.001)则被评为最佳。医生的平均回答质量为 4.2/5.0,ChatGPT 为 3.0/5.0,Bard 为 2.8/5.0(两者均采用 t 检验,方差分析 p < 0.001)。与 ChatGPT(60%)和 Bard(43%;两者 p < 0.001)相比,医生提供的准确回答比例更高(86.7%)。与聊天机器人相比,医生对妇科肿瘤临床问题的回复质量更高。患者应谨防未经验证的人工智能平台提供医疗建议;需要对使用人工智能提供医疗建议进行更大规模的研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The doc versus the bot: A pilot study to assess the quality and accuracy of physician and chatbot responses to clinical questions in gynecologic oncology

Artificial intelligence (AI) applications to medical care are currently under investigation. We aimed to evaluate and compare the quality and accuracy of physician and chatbot responses to common clinical questions in gynecologic oncology. In this cross-sectional pilot study, ten questions about the knowledge and management of gynecologic cancers were selected. Each question was answered by a recruited gynecologic oncologist, ChatGPT (Generative Pretreated Transformer) AI platform, and Bard by Google AI platform. Five recruited gynecologic oncologists who were blinded to the study design were allowed 15 min to respond to each of two questions. Chatbot responses were generated by inserting the question into a fresh session in September 2023. Qualifiers and language identifying the response source were removed. Three gynecologic oncology providers who were blinded to the response source independently reviewed and rated response quality using a 5-point Likert scale, evaluated each response for accuracy, and selected the best response for each question. Overall, physician responses were judged to be best in 76.7 % of evaluations versus ChatGPT (10.0 %) and Bard (13.3 %; p < 0.001). The average quality of responses was 4.2/5.0 for physicians, 3.0/5.0 for ChatGPT and 2.8/5.0 for Bard (t-test for both and ANOVA p < 0.001). Physicians provided a higher proportion of accurate responses (86.7 %) compared to ChatGPT (60 %) and Bard (43 %; p < 0.001 for both). Physicians provided higher quality responses to gynecologic oncology clinical questions compared to chatbots. Patients should be cautioned against non-validated AI platforms for medical advice; larger studies on the use of AI for medical advice are needed.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Gynecologic Oncology Reports
Gynecologic Oncology Reports OBSTETRICS & GYNECOLOGY-
CiteScore
2.00
自引率
0.00%
发文量
183
审稿时长
41 days
期刊介绍: Gynecologic Oncology Reports is an online-only, open access journal devoted to the rapid publication of narrative review articles, survey articles, case reports, case series, letters to the editor regarding previously published manuscripts and other short communications in the field of gynecologic oncology. The journal will consider papers that concern tumors of the female reproductive tract, with originality, quality, and clarity the chief criteria of acceptance.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信