医师评估ChatGPT和Bing回答美国癌症协会关于癌症的问题。

IF 1.6 4区医学 Q4 ONCOLOGY

American Journal of Clinical Oncology-Cancer Clinical Trials Pub Date : 2024-01-01 Epub Date: 2023-10-12 DOI:10.1097/COC.0000000000001050

James R Janopaul-Naylor, Andee Koo, David C Qian, Neal S McCall, Yuan Liu, Sagar A Patel

{"title":"医师评估ChatGPT和Bing回答美国癌症协会关于癌症的问题。","authors":"James R Janopaul-Naylor, Andee Koo, David C Qian, Neal S McCall, Yuan Liu, Sagar A Patel","doi":"10.1097/COC.0000000000001050","DOIUrl":null,"url":null,"abstract":"Objectives: Artificial intelligence (AI) chatbots are a new, publicly available tool for patients to access health care-related information with unknown reliability related to cancer-related questions. This study assesses the quality of responses to common questions for patients with cancer.Methods: From February to March 2023, we queried chat generative pretrained transformer (ChatGPT) from OpenAI and Bing AI from Microsoft questions from the American Cancer Society's recommended \"Questions to Ask About Your Cancer\" customized for all stages of breast, colon, lung, and prostate cancer. Questions were, in addition, grouped by type (prognosis, treatment, or miscellaneous). The quality of AI chatbot responses was assessed by an expert panel using the validated DISCERN criteria.Results: Of the 117 questions presented to ChatGPT and Bing, the average score for all questions were 3.9 and 3.2, respectively ( P < 0.001) and the overall DISCERN scores were 4.1 and 4.4, respectively. By disease site, the average score for ChatGPT and Bing, respectively, were 3.9 and 3.6 for prostate cancer ( P = 0.02), 3.7 and 3.3 for lung cancer ( P < 0.001), 4.1 and 2.9 for breast cancer ( P < 0.001), and 3.8 and 3.0 for colorectal cancer ( P < 0.001). By type of question, the average score for ChatGPT and Bing, respectively, were 3.6 and 3.4 for prognostic questions ( P = 0.12), 3.9 and 3.1 for treatment questions ( P < 0.001), and 4.2 and 3.3 for miscellaneous questions ( P = 0.001). For 3 responses (3%) by ChatGPT and 18 responses (15%) by Bing, at least one panelist rated them as having serious or extensive shortcomings.Conclusions: AI chatbots provide multiple opportunities for innovating health care. This analysis suggests a critical need, particularly around cancer prognostication, for continual refinement to limit misleading counseling, confusion, and emotional distress to patients and families.","PeriodicalId":50812,"journal":{"name":"American Journal of Clinical Oncology-Cancer Clinical Trials","volume":" ","pages":"17-21"},"PeriodicalIF":1.6000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10841271/pdf/","citationCount":"0","resultStr":"{\"title\":\"Physician Assessment of ChatGPT and Bing Answers to American Cancer Society's Questions to Ask About Your Cancer.\",\"authors\":\"James R Janopaul-Naylor, Andee Koo, David C Qian, Neal S McCall, Yuan Liu, Sagar A Patel\",\"doi\":\"10.1097/COC.0000000000001050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: Artificial intelligence (AI) chatbots are a new, publicly available tool for patients to access health care-related information with unknown reliability related to cancer-related questions. This study assesses the quality of responses to common questions for patients with cancer.Methods: From February to March 2023, we queried chat generative pretrained transformer (ChatGPT) from OpenAI and Bing AI from Microsoft questions from the American Cancer Society's recommended \\\"Questions to Ask About Your Cancer\\\" customized for all stages of breast, colon, lung, and prostate cancer. Questions were, in addition, grouped by type (prognosis, treatment, or miscellaneous). The quality of AI chatbot responses was assessed by an expert panel using the validated DISCERN criteria.Results: Of the 117 questions presented to ChatGPT and Bing, the average score for all questions were 3.9 and 3.2, respectively ( P < 0.001) and the overall DISCERN scores were 4.1 and 4.4, respectively. By disease site, the average score for ChatGPT and Bing, respectively, were 3.9 and 3.6 for prostate cancer ( P = 0.02), 3.7 and 3.3 for lung cancer ( P < 0.001), 4.1 and 2.9 for breast cancer ( P < 0.001), and 3.8 and 3.0 for colorectal cancer ( P < 0.001). By type of question, the average score for ChatGPT and Bing, respectively, were 3.6 and 3.4 for prognostic questions ( P = 0.12), 3.9 and 3.1 for treatment questions ( P < 0.001), and 4.2 and 3.3 for miscellaneous questions ( P = 0.001). For 3 responses (3%) by ChatGPT and 18 responses (15%) by Bing, at least one panelist rated them as having serious or extensive shortcomings.Conclusions: AI chatbots provide multiple opportunities for innovating health care. This analysis suggests a critical need, particularly around cancer prognostication, for continual refinement to limit misleading counseling, confusion, and emotional distress to patients and families.\",\"PeriodicalId\":50812,\"journal\":{\"name\":\"American Journal of Clinical Oncology-Cancer Clinical Trials\",\"volume\":\" \",\"pages\":\"17-21\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10841271/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Clinical Oncology-Cancer Clinical Trials\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/COC.0000000000001050\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/10/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q4\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Clinical Oncology-Cancer Clinical Trials","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/COC.0000000000001050","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/10/12 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

目标：人工智能（AI）聊天机器人是一种新的、公开可用的工具，可供患者访问与癌症相关问题可靠性未知的医疗保健相关信息。这项研究评估了癌症患者对常见问题的回答质量。方法：2023年2月至3月，我们从OpenAI和Bing AI中查询了聊天生成预训练转换器（ChatGPT），并从微软中查询了美国癌症协会推荐的针对癌症所有阶段定制的“关于癌症的问题”。此外，问题还按类型（预后、治疗或其他）分组。人工智能聊天机器人响应的质量由一个专家小组使用经过验证的DISCERN标准进行评估。结果：在提交给ChatGPT和Bing的117个问题中，所有问题的平均得分分别为3.9和3.2（P<0.001），总体DISCERN得分分别为4.1和4.4。按疾病部位划分，ChatGPT和Bing的平均得分分别为前列腺癌症3.9分和3.6分（P=0.02），癌症3.7分和3.3分（P<0.001），癌症4.1分和2.9分（P>0.001），癌症3.8分和3.0分（P<0.001），治疗问题为3.9和3.1（P<0.001），杂项问题为4.2和3.3（P=0.001）。对于ChatGPT的3个回复（3%）和Bing的18个回复（15%），至少有一名小组成员将其评为存在严重或广泛的缺点。结论：人工智能聊天机器人为创新医疗保健提供了多种机会。这项分析表明，尤其是在癌症预测方面，迫切需要持续改进，以限制对患者和家属的误导性咨询、困惑和情绪困扰。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Physician Assessment of ChatGPT and Bing Answers to American Cancer Society's Questions to Ask About Your Cancer.

Objectives: Artificial intelligence (AI) chatbots are a new, publicly available tool for patients to access health care-related information with unknown reliability related to cancer-related questions. This study assesses the quality of responses to common questions for patients with cancer.

Methods: From February to March 2023, we queried chat generative pretrained transformer (ChatGPT) from OpenAI and Bing AI from Microsoft questions from the American Cancer Society's recommended "Questions to Ask About Your Cancer" customized for all stages of breast, colon, lung, and prostate cancer. Questions were, in addition, grouped by type (prognosis, treatment, or miscellaneous). The quality of AI chatbot responses was assessed by an expert panel using the validated DISCERN criteria.

Results: Of the 117 questions presented to ChatGPT and Bing, the average score for all questions were 3.9 and 3.2, respectively ( P < 0.001) and the overall DISCERN scores were 4.1 and 4.4, respectively. By disease site, the average score for ChatGPT and Bing, respectively, were 3.9 and 3.6 for prostate cancer ( P = 0.02), 3.7 and 3.3 for lung cancer ( P < 0.001), 4.1 and 2.9 for breast cancer ( P < 0.001), and 3.8 and 3.0 for colorectal cancer ( P < 0.001). By type of question, the average score for ChatGPT and Bing, respectively, were 3.6 and 3.4 for prognostic questions ( P = 0.12), 3.9 and 3.1 for treatment questions ( P < 0.001), and 4.2 and 3.3 for miscellaneous questions ( P = 0.001). For 3 responses (3%) by ChatGPT and 18 responses (15%) by Bing, at least one panelist rated them as having serious or extensive shortcomings.

Conclusions: AI chatbots provide multiple opportunities for innovating health care. This analysis suggests a critical need, particularly around cancer prognostication, for continual refinement to limit misleading counseling, confusion, and emotional distress to patients and families.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

American Journal of Clinical Oncology-Cancer Clinical Trials 医学-肿瘤学

CiteScore

4.90

自引率

0.00%

发文量

130

审稿时长

4-8 weeks

期刊介绍： American Journal of Clinical Oncology is a multidisciplinary journal for cancer surgeons, radiation oncologists, medical oncologists, GYN oncologists, and pediatric oncologists. The emphasis of AJCO is on combined modality multidisciplinary loco-regional management of cancer. The journal also gives emphasis to translational research, outcome studies, and cost utility analyses, and includes opinion pieces and review articles. The editorial board includes a large number of distinguished surgeons, radiation oncologists, medical oncologists, GYN oncologists, pediatric oncologists, and others who are internationally recognized for expertise in their fields.