Artificial intelligence-powered chatbots in search engines: a cross-sectional study on the quality and risks of drug information for patients.

IF 5.6 1区医学 Q1 HEALTH CARE SCIENCES & SERVICES

BMJ Quality & Safety Pub Date : 2025-01-28 DOI:10.1136/bmjqs-2024-017476

Wahram Andrikyan, Sophie Marie Sametinger, Frithjof Kosfeld, Lea Jung-Poppe, Martin F Fromm, Renke Maas, Hagen F Nicolaus

{"title":"Artificial intelligence-powered chatbots in search engines: a cross-sectional study on the quality and risks of drug information for patients.","authors":"Wahram Andrikyan, Sophie Marie Sametinger, Frithjof Kosfeld, Lea Jung-Poppe, Martin F Fromm, Renke Maas, Hagen F Nicolaus","doi":"10.1136/bmjqs-2024-017476","DOIUrl":null,"url":null,"abstract":"Background: Search engines often serve as a primary resource for patients to obtain drug information. However, the search engine market is rapidly changing due to the introduction of artificial intelligence (AI)-powered chatbots. The consequences for medication safety when patients interact with chatbots remain largely unexplored.Objective: To explore the quality and potential safety concerns of answers provided by an AI-powered chatbot integrated within a search engine.Methodology: Bing copilot was queried on 10 frequently asked patient questions regarding the 50 most prescribed drugs in the US outpatient market. Patient questions covered drug indications, mechanisms of action, instructions for use, adverse drug reactions and contraindications. Readability of chatbot answers was assessed using the Flesch Reading Ease Score. Completeness and accuracy were evaluated based on corresponding patient drug information in the pharmaceutical encyclopaedia drugs.com. On a preselected subset of inaccurate chatbot answers, healthcare professionals evaluated likelihood and extent of possible harm if patients follow the chatbot's given recommendations.Results: Of 500 generated chatbot answers, overall readability implied that responses were difficult to read according to the Flesch Reading Ease Score. Overall median completeness and accuracy of chatbot answers were 100.0% (IQR 50.0-100.0%) and 100.0% (IQR 88.1-100.0%), respectively. Of the subset of 20 chatbot answers, experts found 66% (95% CI 50% to 85%) to be potentially harmful. 42% (95% CI 25% to 60%) of these 20 chatbot answers were found to potentially cause moderate to mild harm, and 22% (95% CI 10% to 40%) to cause severe harm or even death if patients follow the chatbot's advice.Conclusions: AI-powered chatbots are capable of providing overall complete and accurate patient drug information. Yet, experts deemed a considerable number of answers incorrect or potentially harmful. Furthermore, complexity of chatbot answers may limit patient understanding. Hence, healthcare professionals should be cautious in recommending AI-powered search engines until more precise and reliable alternatives are available.","PeriodicalId":9077,"journal":{"name":"BMJ Quality & Safety","volume":" ","pages":"100-109"},"PeriodicalIF":5.6000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11874309/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Quality & Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjqs-2024-017476","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Search engines often serve as a primary resource for patients to obtain drug information. However, the search engine market is rapidly changing due to the introduction of artificial intelligence (AI)-powered chatbots. The consequences for medication safety when patients interact with chatbots remain largely unexplored.

Objective: To explore the quality and potential safety concerns of answers provided by an AI-powered chatbot integrated within a search engine.

Methodology: Bing copilot was queried on 10 frequently asked patient questions regarding the 50 most prescribed drugs in the US outpatient market. Patient questions covered drug indications, mechanisms of action, instructions for use, adverse drug reactions and contraindications. Readability of chatbot answers was assessed using the Flesch Reading Ease Score. Completeness and accuracy were evaluated based on corresponding patient drug information in the pharmaceutical encyclopaedia drugs.com. On a preselected subset of inaccurate chatbot answers, healthcare professionals evaluated likelihood and extent of possible harm if patients follow the chatbot's given recommendations.

Results: Of 500 generated chatbot answers, overall readability implied that responses were difficult to read according to the Flesch Reading Ease Score. Overall median completeness and accuracy of chatbot answers were 100.0% (IQR 50.0-100.0%) and 100.0% (IQR 88.1-100.0%), respectively. Of the subset of 20 chatbot answers, experts found 66% (95% CI 50% to 85%) to be potentially harmful. 42% (95% CI 25% to 60%) of these 20 chatbot answers were found to potentially cause moderate to mild harm, and 22% (95% CI 10% to 40%) to cause severe harm or even death if patients follow the chatbot's advice.

Conclusions: AI-powered chatbots are capable of providing overall complete and accurate patient drug information. Yet, experts deemed a considerable number of answers incorrect or potentially harmful. Furthermore, complexity of chatbot answers may limit patient understanding. Hence, healthcare professionals should be cautious in recommending AI-powered search engines until more precise and reliable alternatives are available.

查看原文本刊更多论文

搜索引擎中的人工智能聊天机器人：关于患者药物信息质量和风险的横断面研究。

背景：搜索引擎通常是患者获取药物信息的主要资源。然而，由于引入了人工智能（AI）驱动的聊天机器人，搜索引擎市场正在迅速发生变化。患者与聊天机器人互动时对用药安全的影响在很大程度上仍未得到探讨：探索搜索引擎中集成的人工智能聊天机器人所提供答案的质量和潜在的安全问题：对 Bing copilot 进行了查询，内容涉及 10 个患者常问的问题，涉及美国门诊市场上处方量最大的 50 种药物。患者问题涉及药物适应症、作用机制、使用说明、药物不良反应和禁忌症。聊天机器人答案的可读性使用 Flesch 阅读容易度评分进行评估。根据医药百科全书 drugs.com 中相应的患者药物信息对完整性和准确性进行了评估。对于预选的不准确聊天机器人答案子集，医护人员评估了如果患者遵循聊天机器人给出的建议，可能造成伤害的可能性和程度：在生成的 500 个聊天机器人答案中，根据弗莱什阅读难易度评分，总体可读性意味着答案难以阅读。聊天机器人答案的整体完整性和准确性中位数分别为 100.0%（IQR 50.0-100.0%）和 100.0%（IQR 88.1-100.0%）。在 20 个聊天机器人答案子集中，专家发现 66% （95% CI 50% 到 85%）的答案可能有害。在这20个聊天机器人答案中，42%（95% CI 25%至60%）被认为可能会造成中度至轻度伤害，22%（95% CI 10%至40%）被认为会造成严重伤害，如果患者听从聊天机器人的建议，甚至会导致死亡：人工智能驱动的聊天机器人能够为患者提供全面、准确的药物信息。然而，专家们认为相当多的答案是不正确的或可能有害的。此外，聊天机器人答案的复杂性可能会限制患者的理解。因此，在出现更精确、更可靠的替代品之前，医疗保健专业人员在推荐人工智能驱动的搜索引擎时应谨慎行事。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMJ Quality & Safety HEALTH CARE SCIENCES & SERVICES-

CiteScore

9.80

自引率

7.40%

发文量

104

审稿时长

4-8 weeks

期刊介绍： BMJ Quality & Safety (previously Quality & Safety in Health Care) is an international peer review publication providing research, opinions, debates and reviews for academics, clinicians and healthcare managers focused on the quality and safety of health care and the science of improvement. The journal receives approximately 1000 manuscripts a year and has an acceptance rate for original research of 12%. Time from submission to first decision averages 22 days and accepted articles are typically published online within 20 days. Its current impact factor is 3.281.