药品信息中使用的 ChatGPT 的性能和风险：一项探索性真实世界分析。

IF 1.6 4区医学 Q3 PHARMACOLOGY & PHARMACY

European journal of hospital pharmacy : science and practice Pub Date : 2024-10-25 DOI:10.1136/ejhpharm-2023-003750

Benedict Morath, Ute Chiriac, Elena Jaszkowski, Carolin Deiß, Hannah Nürnberg, Katrin Hörth, Torsten Hoppe-Tichy, Kim Green

{"title":"药品信息中使用的 ChatGPT 的性能和风险：一项探索性真实世界分析。","authors":"Benedict Morath, Ute Chiriac, Elena Jaszkowski, Carolin Deiß, Hannah Nürnberg, Katrin Hörth, Torsten Hoppe-Tichy, Kim Green","doi":"10.1136/ejhpharm-2023-003750","DOIUrl":null,"url":null,"abstract":"Objectives: To investigate the performance and risk associated with the usage of Chat Generative Pre-trained Transformer (ChatGPT) to answer drug-related questions.Methods: A sample of 50 drug-related questions were consecutively collected and entered in the artificial intelligence software application ChatGPT. Answers were documented and rated in a standardised consensus process by six senior hospital pharmacists in the domains content (correct, incomplete, false), patient management (possible, insufficient, not possible) and risk (no risk, low risk, high risk). As reference, answers were researched in adherence to the German guideline of drug information and stratified in four categories according to the sources used. In addition, the reproducibility of ChatGPT's answers was analysed by entering three questions at different timepoints repeatedly (day 1, day 2, week 2, week 3).Results: Overall, only 13 of 50 answers provided correct content and had enough information to initiate management with no risk of patient harm. The majority of answers were either false (38%, n=19) or had partly correct content (36%, n=18) and no references were provided. A high risk of patient harm was likely in 26% (n=13) of the cases and risk was judged low for 28% (n=14) of the cases. In all high-risk cases, actions could have been initiated based on the provided information. The answers of ChatGPT varied over time when entered repeatedly and only three out of 12 answers were identical, showing no reproducibility to low reproducibility.Conclusion: In a real-world sample of 50 drug-related questions, ChatGPT answered the majority of questions wrong or partly wrong. The use of artificial intelligence applications in drug information is not possible as long as barriers like wrong content, missing references and reproducibility remain.","PeriodicalId":12050,"journal":{"name":"European journal of hospital pharmacy : science and practice","volume":" ","pages":"491-497"},"PeriodicalIF":1.6000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance and risks of ChatGPT used in drug information: an exploratory real-world analysis.\",\"authors\":\"Benedict Morath, Ute Chiriac, Elena Jaszkowski, Carolin Deiß, Hannah Nürnberg, Katrin Hörth, Torsten Hoppe-Tichy, Kim Green\",\"doi\":\"10.1136/ejhpharm-2023-003750\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Objectives: To investigate the performance and risk associated with the usage of Chat Generative Pre-trained Transformer (ChatGPT) to answer drug-related questions.Methods: A sample of 50 drug-related questions were consecutively collected and entered in the artificial intelligence software application ChatGPT. Answers were documented and rated in a standardised consensus process by six senior hospital pharmacists in the domains content (correct, incomplete, false), patient management (possible, insufficient, not possible) and risk (no risk, low risk, high risk). As reference, answers were researched in adherence to the German guideline of drug information and stratified in four categories according to the sources used. In addition, the reproducibility of ChatGPT's answers was analysed by entering three questions at different timepoints repeatedly (day 1, day 2, week 2, week 3).Results: Overall, only 13 of 50 answers provided correct content and had enough information to initiate management with no risk of patient harm. The majority of answers were either false (38%, n=19) or had partly correct content (36%, n=18) and no references were provided. A high risk of patient harm was likely in 26% (n=13) of the cases and risk was judged low for 28% (n=14) of the cases. In all high-risk cases, actions could have been initiated based on the provided information. The answers of ChatGPT varied over time when entered repeatedly and only three out of 12 answers were identical, showing no reproducibility to low reproducibility.Conclusion: In a real-world sample of 50 drug-related questions, ChatGPT answered the majority of questions wrong or partly wrong. The use of artificial intelligence applications in drug information is not possible as long as barriers like wrong content, missing references and reproducibility remain.\",\"PeriodicalId\":12050,\"journal\":{\"name\":\"European journal of hospital pharmacy : science and practice\",\"volume\":\" \",\"pages\":\"491-497\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European journal of hospital pharmacy : science and practice\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1136/ejhpharm-2023-003750\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European journal of hospital pharmacy : science and practice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/ejhpharm-2023-003750","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}

引用次数: 0

摘要

目的调查使用 Chat Generative Pre-trained Transformer（ChatGPT）回答毒品相关问题的性能和风险：方法：连续收集 50 个与药物相关的问题样本，并将其输入人工智能应用软件 ChatGPT。由六位资深医院药剂师按照标准化的共识流程，从内容（正确、不完整、错误）、患者管理（可能、不充分、不可能）和风险（无风险、低风险、高风险）三个方面对答案进行记录和评分。作为参考，根据德国药物信息指南对答案进行了研究，并根据所使用的信息来源将答案分为四类。此外，通过在不同时间点（第 1 天、第 2 天、第 2 周、第 3 周）重复输入三个问题，分析了 ChatGPT 答案的可重复性：总的来说，50 个答案中只有 13 个提供了正确的内容，并且有足够的信息来启动管理，不会对患者造成伤害。大多数答案要么是错误的（38%，n=19），要么内容部分正确（36%，n=18），而且没有提供参考文献。26%（13 人）的病例可能存在伤害患者的高风险，28%（14 人）的病例被判定为低风险。在所有高风险案例中，都可以根据所提供的信息采取行动。当重复输入时，ChatGPT 的答案会随着时间的推移而变化，12 个答案中只有 3 个是相同的，这表明没有可重复性或可重复性较低：结论：在真实世界的 50 个药物相关问题样本中，ChatGPT 回答了大部分错误或部分错误的问题。只要错误内容、参考文献缺失和可重复性等障碍依然存在，人工智能应用就不可能应用于药物信息领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance and risks of ChatGPT used in drug information: an exploratory real-world analysis.

Objectives: To investigate the performance and risk associated with the usage of Chat Generative Pre-trained Transformer (ChatGPT) to answer drug-related questions.

Methods: A sample of 50 drug-related questions were consecutively collected and entered in the artificial intelligence software application ChatGPT. Answers were documented and rated in a standardised consensus process by six senior hospital pharmacists in the domains content (correct, incomplete, false), patient management (possible, insufficient, not possible) and risk (no risk, low risk, high risk). As reference, answers were researched in adherence to the German guideline of drug information and stratified in four categories according to the sources used. In addition, the reproducibility of ChatGPT's answers was analysed by entering three questions at different timepoints repeatedly (day 1, day 2, week 2, week 3).

Results: Overall, only 13 of 50 answers provided correct content and had enough information to initiate management with no risk of patient harm. The majority of answers were either false (38%, n=19) or had partly correct content (36%, n=18) and no references were provided. A high risk of patient harm was likely in 26% (n=13) of the cases and risk was judged low for 28% (n=14) of the cases. In all high-risk cases, actions could have been initiated based on the provided information. The answers of ChatGPT varied over time when entered repeatedly and only three out of 12 answers were identical, showing no reproducibility to low reproducibility.

Conclusion: In a real-world sample of 50 drug-related questions, ChatGPT answered the majority of questions wrong or partly wrong. The use of artificial intelligence applications in drug information is not possible as long as barriers like wrong content, missing references and reproducibility remain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European journal of hospital pharmacy : science and practice PHARMACOLOGY & PHARMACY-

CiteScore

3.40

自引率

5.90%

发文量

104

审稿时长

6-12 weeks

期刊介绍： European Journal of Hospital Pharmacy (EJHP) offers a high quality, peer-reviewed platform for the publication of practical and innovative research which aims to strengthen the profile and professional status of hospital pharmacists. EJHP is committed to being the leading journal on all aspects of hospital pharmacy, thereby advancing the science, practice and profession of hospital pharmacy. The journal aims to become a major source for education and inspiration to improve practice and the standard of patient care in hospitals and related institutions worldwide. EJHP is the only official journal of the European Association of Hospital Pharmacists.