{"title":"分析人工智能程序对源自重症监护指南的药物相关问题的回应。","authors":"Blake Williams, Brian L Erstad","doi":"10.1093/ajhp/zxaf075","DOIUrl":null,"url":null,"abstract":"<p><strong>Disclaimer: </strong>In an effort to expedite the publication of articles, AJHP is posting manuscripts online as soon as possible after acceptance. Accepted manuscripts have been peer-reviewed and copyedited, but are posted online before technical formatting and author proofing. These manuscripts are not the final version of record and will be replaced with the final article (formatted per AJHP style and proofed by the authors) at a later time.</p><p><strong>Purpose: </strong>To evaluate the recommendations given by 4 publicly available artificial intelligence (AI) programs in comparison to recommendations in current clinical practice guidelines (CPGs) focused on critically ill adults.</p><p><strong>Methods: </strong>This study evaluated 4 publicly available large language models (LLMs): ChatGPT 4.0, Microsoft Copilot Google Gemini Version 1.5, and Meta AI. Each AI chatbot was prompted with medication-related questions related to 6 CPGs published by the Society of Critical Care Medicine (SCCM) and also asked to provide references to support its recommendations. Responses were categorized as correct, partially correct, not correct, or \"other\" (eg, the LLM answered a question not asked).</p><p><strong>Results: </strong>In total, 43 responses were recorded for each AI program, with a significant difference (P = 0.007) in response types by AI program. Microsoft Copilot had the highest proportion of correct recommendations, followed by Meta AI, ChatGPT 4.0, and Google Gemini. All 4 LLMs gave some incorrect recommendations, with Gemini having the most incorrect responses, followed closely by ChatGPT. Copilot had the most responses in the \"other\" category (n = 5, 11.63%). On average, ChatGPT provided the greatest number of references per question (n = 4.54), followed by Google Gemini (n = 3.43), Meta AI (n = 3.06), and Microsoft Copilot (n = 2.04).</p><p><strong>Conclusion: </strong>Although they showed potential for future utility to pharmacists with further development and refinement, the evaluated AI programs did not consistently give accurate medication-related recommendations for the purpose of answering clinical questions such as those pertaining to critical care CPGs.</p>","PeriodicalId":7577,"journal":{"name":"American Journal of Health-System Pharmacy","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis of responses from artificial intelligence programs to medication-related questions derived from critical care guidelines.\",\"authors\":\"Blake Williams, Brian L Erstad\",\"doi\":\"10.1093/ajhp/zxaf075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Disclaimer: </strong>In an effort to expedite the publication of articles, AJHP is posting manuscripts online as soon as possible after acceptance. Accepted manuscripts have been peer-reviewed and copyedited, but are posted online before technical formatting and author proofing. These manuscripts are not the final version of record and will be replaced with the final article (formatted per AJHP style and proofed by the authors) at a later time.</p><p><strong>Purpose: </strong>To evaluate the recommendations given by 4 publicly available artificial intelligence (AI) programs in comparison to recommendations in current clinical practice guidelines (CPGs) focused on critically ill adults.</p><p><strong>Methods: </strong>This study evaluated 4 publicly available large language models (LLMs): ChatGPT 4.0, Microsoft Copilot Google Gemini Version 1.5, and Meta AI. Each AI chatbot was prompted with medication-related questions related to 6 CPGs published by the Society of Critical Care Medicine (SCCM) and also asked to provide references to support its recommendations. Responses were categorized as correct, partially correct, not correct, or \\\"other\\\" (eg, the LLM answered a question not asked).</p><p><strong>Results: </strong>In total, 43 responses were recorded for each AI program, with a significant difference (P = 0.007) in response types by AI program. Microsoft Copilot had the highest proportion of correct recommendations, followed by Meta AI, ChatGPT 4.0, and Google Gemini. All 4 LLMs gave some incorrect recommendations, with Gemini having the most incorrect responses, followed closely by ChatGPT. Copilot had the most responses in the \\\"other\\\" category (n = 5, 11.63%). On average, ChatGPT provided the greatest number of references per question (n = 4.54), followed by Google Gemini (n = 3.43), Meta AI (n = 3.06), and Microsoft Copilot (n = 2.04).</p><p><strong>Conclusion: </strong>Although they showed potential for future utility to pharmacists with further development and refinement, the evaluated AI programs did not consistently give accurate medication-related recommendations for the purpose of answering clinical questions such as those pertaining to critical care CPGs.</p>\",\"PeriodicalId\":7577,\"journal\":{\"name\":\"American Journal of Health-System Pharmacy\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-03-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"American Journal of Health-System Pharmacy\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1093/ajhp/zxaf075\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"PHARMACOLOGY & PHARMACY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"American Journal of Health-System Pharmacy","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ajhp/zxaf075","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0
摘要
免责声明:为了加快文章的发表,AJHP在接受稿件后将尽快在网上发布。被接受的稿件已经过同行评审和编辑,但在技术格式化和作者校对之前会在网上发布。这些手稿不是记录的最终版本,稍后将被最终文章(按照AJHP风格格式化并由作者校对)所取代。目的:评估4个公开可用的人工智能(AI)程序给出的建议,并将其与当前针对危重成人的临床实践指南(CPGs)中的建议进行比较。方法:本研究评估了4个公开可用的大型语言模型(llm): ChatGPT 4.0、Microsoft Copilot谷歌Gemini Version 1.5和Meta AI。每个人工智能聊天机器人都被提示与危重病医学学会(SCCM)发表的6个cpg相关的药物相关问题,并被要求提供参考资料来支持其建议。回答分为正确、部分正确、不正确或“其他”(例如,法学硕士回答了一个没有被问到的问题)。结果:每个AI程序共记录到43个应答,不同AI程序的应答类型差异有统计学意义(P = 0.007)。Microsoft Copilot的正确推荐比例最高,其次是Meta AI、ChatGPT 4.0和谷歌Gemini。所有4位法学硕士都给出了一些错误的建议,其中Gemini的回答错误最多,其次是ChatGPT。副驾驶在“其他”类别中回答最多(n = 5, 11.63%)。平均而言,ChatGPT每个问题提供的参考最多(n = 4.54),其次是谷歌Gemini (n = 3.43), Meta AI (n = 3.06)和Microsoft Copilot (n = 2.04)。结论:尽管经过进一步的发展和完善,它们显示出未来对药剂师有用的潜力,但评估的人工智能程序并没有始终提供准确的药物相关建议,以回答临床问题,如与重症监护cpg有关的问题。
Analysis of responses from artificial intelligence programs to medication-related questions derived from critical care guidelines.
Disclaimer: In an effort to expedite the publication of articles, AJHP is posting manuscripts online as soon as possible after acceptance. Accepted manuscripts have been peer-reviewed and copyedited, but are posted online before technical formatting and author proofing. These manuscripts are not the final version of record and will be replaced with the final article (formatted per AJHP style and proofed by the authors) at a later time.
Purpose: To evaluate the recommendations given by 4 publicly available artificial intelligence (AI) programs in comparison to recommendations in current clinical practice guidelines (CPGs) focused on critically ill adults.
Methods: This study evaluated 4 publicly available large language models (LLMs): ChatGPT 4.0, Microsoft Copilot Google Gemini Version 1.5, and Meta AI. Each AI chatbot was prompted with medication-related questions related to 6 CPGs published by the Society of Critical Care Medicine (SCCM) and also asked to provide references to support its recommendations. Responses were categorized as correct, partially correct, not correct, or "other" (eg, the LLM answered a question not asked).
Results: In total, 43 responses were recorded for each AI program, with a significant difference (P = 0.007) in response types by AI program. Microsoft Copilot had the highest proportion of correct recommendations, followed by Meta AI, ChatGPT 4.0, and Google Gemini. All 4 LLMs gave some incorrect recommendations, with Gemini having the most incorrect responses, followed closely by ChatGPT. Copilot had the most responses in the "other" category (n = 5, 11.63%). On average, ChatGPT provided the greatest number of references per question (n = 4.54), followed by Google Gemini (n = 3.43), Meta AI (n = 3.06), and Microsoft Copilot (n = 2.04).
Conclusion: Although they showed potential for future utility to pharmacists with further development and refinement, the evaluated AI programs did not consistently give accurate medication-related recommendations for the purpose of answering clinical questions such as those pertaining to critical care CPGs.
期刊介绍:
The American Journal of Health-System Pharmacy (AJHP) is the official publication of the American Society of Health-System Pharmacists (ASHP). It publishes peer-reviewed scientific papers on contemporary drug therapy and pharmacy practice innovations in hospitals and health systems. With a circulation of more than 43,000, AJHP is the most widely recognized and respected clinical pharmacy journal in the world.