医生与机器人评估医生和聊天机器人回答妇科肿瘤临床问题的质量和准确性的试点研究

IF 1.2 Q3 OBSTETRICS & GYNECOLOGY

Gynecologic Oncology Reports Pub Date : 2024-08-08 DOI:10.1016/j.gore.2024.101477

Mary Katherine Anastasio , Pamela Peters , Jonathan Foote , Alexander Melamed , Susan C. Modesitt , Fernanda Musa , Emma Rossi , Benjamin B. Albright , Laura J. Havrilesky , Haley A. Moss

{"title":"医生与机器人评估医生和聊天机器人回答妇科肿瘤临床问题的质量和准确性的试点研究","authors":"Mary Katherine Anastasio , Pamela Peters , Jonathan Foote , Alexander Melamed , Susan C. Modesitt , Fernanda Musa , Emma Rossi , Benjamin B. Albright , Laura J. Havrilesky , Haley A. Moss","doi":"10.1016/j.gore.2024.101477","DOIUrl":null,"url":null,"abstract":"<div><p>Artificial intelligence (AI) applications to medical care are currently under investigation. We aimed to evaluate and compare the quality and accuracy of physician and chatbot responses to common clinical questions in gynecologic oncology. In this cross-sectional pilot study, ten questions about the knowledge and management of gynecologic cancers were selected. Each question was answered by a recruited gynecologic oncologist, ChatGPT (Generative Pretreated Transformer) AI platform, and Bard by Google AI platform. Five recruited gynecologic oncologists who were blinded to the study design were allowed 15 min to respond to each of two questions. Chatbot responses were generated by inserting the question into a fresh session in September 2023. Qualifiers and language identifying the response source were removed. Three gynecologic oncology providers who were blinded to the response source independently reviewed and rated response quality using a 5-point Likert scale, evaluated each response for accuracy, and selected the best response for each question. Overall, physician responses were judged to be best in 76.7 % of evaluations versus ChatGPT (10.0 %) and Bard (13.3 %; p < 0.001). The average quality of responses was 4.2/5.0 for physicians, 3.0/5.0 for ChatGPT and 2.8/5.0 for Bard (<em>t</em>-test for both and ANOVA p < 0.001). Physicians provided a higher proportion of accurate responses (86.7 %) compared to ChatGPT (60 %) and Bard (43 %; p < 0.001 for both). Physicians provided higher quality responses to gynecologic oncology clinical questions compared to chatbots. Patients should be cautioned against non-validated AI platforms for medical advice; larger studies on the use of AI for medical advice are needed.</p></div>","PeriodicalId":12873,"journal":{"name":"Gynecologic Oncology Reports","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2352578924001565/pdfft?md5=726507797baa7aed0cfa4f99c08eb164&pid=1-s2.0-S2352578924001565-main.pdf","citationCount":"0","resultStr":"{\"title\":\"The doc versus the bot: A pilot study to assess the quality and accuracy of physician and chatbot responses to clinical questions in gynecologic oncology\",\"authors\":\"Mary Katherine Anastasio , Pamela Peters , Jonathan Foote , Alexander Melamed , Susan C. Modesitt , Fernanda Musa , Emma Rossi , Benjamin B. Albright , Laura J. Havrilesky , Haley A. Moss\",\"doi\":\"10.1016/j.gore.2024.101477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Artificial intelligence (AI) applications to medical care are currently under investigation. We aimed to evaluate and compare the quality and accuracy of physician and chatbot responses to common clinical questions in gynecologic oncology. In this cross-sectional pilot study, ten questions about the knowledge and management of gynecologic cancers were selected. Each question was answered by a recruited gynecologic oncologist, ChatGPT (Generative Pretreated Transformer) AI platform, and Bard by Google AI platform. Five recruited gynecologic oncologists who were blinded to the study design were allowed 15 min to respond to each of two questions. Chatbot responses were generated by inserting the question into a fresh session in September 2023. Qualifiers and language identifying the response source were removed. Three gynecologic oncology providers who were blinded to the response source independently reviewed and rated response quality using a 5-point Likert scale, evaluated each response for accuracy, and selected the best response for each question. Overall, physician responses were judged to be best in 76.7 % of evaluations versus ChatGPT (10.0 %) and Bard (13.3 %; p < 0.001). The average quality of responses was 4.2/5.0 for physicians, 3.0/5.0 for ChatGPT and 2.8/5.0 for Bard (<em>t</em>-test for both and ANOVA p < 0.001). Physicians provided a higher proportion of accurate responses (86.7 %) compared to ChatGPT (60 %) and Bard (43 %; p < 0.001 for both). Physicians provided higher quality responses to gynecologic oncology clinical questions compared to chatbots. Patients should be cautioned against non-validated AI platforms for medical advice; larger studies on the use of AI for medical advice are needed.</p></div>\",\"PeriodicalId\":12873,\"journal\":{\"name\":\"Gynecologic Oncology Reports\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2024-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2352578924001565/pdfft?md5=726507797baa7aed0cfa4f99c08eb164&pid=1-s2.0-S2352578924001565-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Gynecologic Oncology Reports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352578924001565\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"OBSTETRICS & GYNECOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Gynecologic Oncology Reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352578924001565","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"OBSTETRICS & GYNECOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

人工智能（AI）在医疗领域的应用目前正在研究之中。我们的目的是评估和比较医生和聊天机器人对妇科肿瘤学常见临床问题回答的质量和准确性。在这项横断面试点研究中，我们选择了十个有关妇科癌症知识和管理的问题。每个问题分别由一名受聘的妇科肿瘤专家、ChatGPT（生成预处理转换器）人工智能平台和谷歌人工智能平台 Bard 回答。五名受聘的妇科肿瘤专家对研究设计保持盲注，每人有 15 分钟的时间回答两个问题。聊天机器人的回答是在 2023 年 9 月的新会话中插入问题生成的。删除了可识别回复来源的限定词和语言。三位妇科肿瘤提供者对回复来源进行了保密，他们使用 5 点李克特量表对回复质量进行了独立审核和评分，评估了每个回复的准确性，并为每个问题选出了最佳回复。总体而言，在 76.7% 的评估中，医生回答被评为最佳，而 ChatGPT（10.0%）和 Bard（13.3%；p <0.001）则被评为最佳。医生的平均回答质量为 4.2/5.0，ChatGPT 为 3.0/5.0，Bard 为 2.8/5.0（两者均采用 t 检验，方差分析 p < 0.001）。与 ChatGPT（60%）和 Bard（43%；两者 p < 0.001）相比，医生提供的准确回答比例更高（86.7%）。与聊天机器人相比，医生对妇科肿瘤临床问题的回复质量更高。患者应谨防未经验证的人工智能平台提供医疗建议；需要对使用人工智能提供医疗建议进行更大规模的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The doc versus the bot: A pilot study to assess the quality and accuracy of physician and chatbot responses to clinical questions in gynecologic oncology

Artificial intelligence (AI) applications to medical care are currently under investigation. We aimed to evaluate and compare the quality and accuracy of physician and chatbot responses to common clinical questions in gynecologic oncology. In this cross-sectional pilot study, ten questions about the knowledge and management of gynecologic cancers were selected. Each question was answered by a recruited gynecologic oncologist, ChatGPT (Generative Pretreated Transformer) AI platform, and Bard by Google AI platform. Five recruited gynecologic oncologists who were blinded to the study design were allowed 15 min to respond to each of two questions. Chatbot responses were generated by inserting the question into a fresh session in September 2023. Qualifiers and language identifying the response source were removed. Three gynecologic oncology providers who were blinded to the response source independently reviewed and rated response quality using a 5-point Likert scale, evaluated each response for accuracy, and selected the best response for each question. Overall, physician responses were judged to be best in 76.7 % of evaluations versus ChatGPT (10.0 %) and Bard (13.3 %; p < 0.001). The average quality of responses was 4.2/5.0 for physicians, 3.0/5.0 for ChatGPT and 2.8/5.0 for Bard (t-test for both and ANOVA p < 0.001). Physicians provided a higher proportion of accurate responses (86.7 %) compared to ChatGPT (60 %) and Bard (43 %; p < 0.001 for both). Physicians provided higher quality responses to gynecologic oncology clinical questions compared to chatbots. Patients should be cautioned against non-validated AI platforms for medical advice; larger studies on the use of AI for medical advice are needed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Gynecologic Oncology Reports OBSTETRICS & GYNECOLOGY-

CiteScore

2.00

自引率

0.00%

发文量

183

审稿时长

41 days

期刊介绍： Gynecologic Oncology Reports is an online-only, open access journal devoted to the rapid publication of narrative review articles, survey articles, case reports, case series, letters to the editor regarding previously published manuscripts and other short communications in the field of gynecologic oncology. The journal will consider papers that concern tumors of the female reproductive tract, with originality, quality, and clarity the chief criteria of acceptance.