Can large language models detect drug-drug interactions leading to adverse drug reactions?

IF 3.4 3区 医学 Q2 PHARMACOLOGY & PHARMACY
Therapeutic Advances in Drug Safety Pub Date : 2025-05-16 eCollection Date: 2025-01-01 DOI:10.1177/20420986251339358
Justine Sicard, François Montastruc, Coline Achalme, Annie Pierre Jonville-Bera, Paul Songue, Marina Babin, Thomas Soeiro, Pauline Schiro, Claire de Canecaude, Romain Barus
{"title":"Can large language models detect drug-drug interactions leading to adverse drug reactions?","authors":"Justine Sicard, François Montastruc, Coline Achalme, Annie Pierre Jonville-Bera, Paul Songue, Marina Babin, Thomas Soeiro, Pauline Schiro, Claire de Canecaude, Romain Barus","doi":"10.1177/20420986251339358","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Drug-drug interactions (DDI) are an important cause of adverse drug reactions (ADRs). Could large language models (LLMs) serve as valuable tools for pharmacovigilance specialists in detecting DDIs that lead to ADR notifications?</p><p><strong>Objective: </strong>To compare the performance of three LLMs (ChatGPT, Gemini, and Claude) in detecting and explaining clinically significant DDIs that have led to an ADR.</p><p><strong>Design: </strong>Observational cross-sectional study.</p><p><strong>Methods: </strong>We used the French National Pharmacovigilance Database to randomly extract Individual Case Safety Reports (ICSRs) of ADRs with DDI (positive controls) and ICSRs of ADRs without DDI (negative controls) registered in 2022. Interaction cases were classified by difficulty level (level-1 DDI being the easiest and level-2 DDI being the most difficult). We give each LLM (ChatGPT, Gemini, and Claude) the same prompt and case summary. Sensitivity, specificity, and <i>F</i>-measure were calculated for each LLM in detecting DDIs in the case summaries.</p><p><strong>Results: </strong>We assessed 82 ICSRs with DDIs and 22 ICSRs without DDIs. Among ICSRs with DDIs, 37 involved level-1 DDIs, and 45 involved level-2 DDIs. Correct responses were more frequent for level-1 DDIs than for level-2 DDIs. Regardless of difficulty level, ChatGPT detected 99% of DDI cases, and Claude and Gemini detected 95%. The percentage of correct answers to all DDI-related questions was 66% for ChatGPT, 68% for Claude, and 33% for Gemini. ChatGPT and Claude produced comparable results and outperformed Gemini (<i>F</i>-measure between 0.83 and 0.85 for ChatGPT and Claude and 0.63-0.68 for Gemini) to detect drugs involved in DDI. All exhibited low specificity (ChatGPT 0.68, Claude 0.64, and Gemini 0.36) and reported nonexistent DDIs for negative controls.</p><p><strong>Conclusion: </strong>LLMs can detect DDIs leading to pharmacovigilance cases, but cannot reliably exclude DDIs in cases without interactions. Pharmacologists are crucial for assessing whether a DDI is implicated in an ADR.</p>","PeriodicalId":23012,"journal":{"name":"Therapeutic Advances in Drug Safety","volume":"16 ","pages":"20420986251339358"},"PeriodicalIF":3.4000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12084699/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Therapeutic Advances in Drug Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20420986251339358","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Drug-drug interactions (DDI) are an important cause of adverse drug reactions (ADRs). Could large language models (LLMs) serve as valuable tools for pharmacovigilance specialists in detecting DDIs that lead to ADR notifications?

Objective: To compare the performance of three LLMs (ChatGPT, Gemini, and Claude) in detecting and explaining clinically significant DDIs that have led to an ADR.

Design: Observational cross-sectional study.

Methods: We used the French National Pharmacovigilance Database to randomly extract Individual Case Safety Reports (ICSRs) of ADRs with DDI (positive controls) and ICSRs of ADRs without DDI (negative controls) registered in 2022. Interaction cases were classified by difficulty level (level-1 DDI being the easiest and level-2 DDI being the most difficult). We give each LLM (ChatGPT, Gemini, and Claude) the same prompt and case summary. Sensitivity, specificity, and F-measure were calculated for each LLM in detecting DDIs in the case summaries.

Results: We assessed 82 ICSRs with DDIs and 22 ICSRs without DDIs. Among ICSRs with DDIs, 37 involved level-1 DDIs, and 45 involved level-2 DDIs. Correct responses were more frequent for level-1 DDIs than for level-2 DDIs. Regardless of difficulty level, ChatGPT detected 99% of DDI cases, and Claude and Gemini detected 95%. The percentage of correct answers to all DDI-related questions was 66% for ChatGPT, 68% for Claude, and 33% for Gemini. ChatGPT and Claude produced comparable results and outperformed Gemini (F-measure between 0.83 and 0.85 for ChatGPT and Claude and 0.63-0.68 for Gemini) to detect drugs involved in DDI. All exhibited low specificity (ChatGPT 0.68, Claude 0.64, and Gemini 0.36) and reported nonexistent DDIs for negative controls.

Conclusion: LLMs can detect DDIs leading to pharmacovigilance cases, but cannot reliably exclude DDIs in cases without interactions. Pharmacologists are crucial for assessing whether a DDI is implicated in an ADR.

大型语言模型能检测到导致药物不良反应的药物-药物相互作用吗?
背景:药物-药物相互作用(DDI)是引起药物不良反应(adr)的重要原因。大型语言模型(LLMs)能否作为药物警戒专家检测导致ADR通知的ddi的有价值的工具?目的:比较三种LLMs (ChatGPT、Gemini和Claude)在检测和解释导致不良反应的临床显著ddi方面的表现。设计:观察性横断面研究。方法:使用法国国家药物警戒数据库,随机抽取2022年登记的DDI(阳性对照)和不DDI(阴性对照)adr的个案安全报告(icsr)。交互案例按难度等级进行分类(1级DDI最简单,2级DDI最难)。我们给每个LLM (ChatGPT, Gemini和Claude)提供相同的提示和案例摘要。计算每个LLM在病例总结中检测ddi的敏感性、特异性和F-measure。结果:我们评估了82例有ddi的icsr和22例没有ddi的icsr。合并ddi的icsr中,一级ddi 37例,二级ddi 45例。1级ddi的正确应答频率高于2级ddi。无论难度如何,ChatGPT检测到99%的DDI病例,Claude和Gemini检测到95%。ChatGPT对所有与ddi相关的问题的正确答案百分比为66%,Claude为68%,Gemini为33%。ChatGPT和Claude产生了类似的结果,并且在检测DDI涉及的药物方面优于Gemini (ChatGPT和Claude的f测量值在0.83和0.85之间,Gemini的f测量值在0.63-0.68之间)。所有患者均表现出低特异性(ChatGPT为0.68,Claude为0.64,Gemini为0.36),阴性对照无ddi。结论:LLMs可以检测到导致药物警戒病例的ddi,但不能可靠地排除无相互作用病例的ddi。药理学家对于评估DDI是否与不良反应有关至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Therapeutic Advances in Drug Safety
Therapeutic Advances in Drug Safety Medicine-Pharmacology (medical)
CiteScore
6.70
自引率
4.50%
发文量
31
审稿时长
9 weeks
期刊介绍: Therapeutic Advances in Drug Safety delivers the highest quality peer-reviewed articles, reviews, and scholarly comment on pioneering efforts and innovative studies pertaining to the safe use of drugs in patients. The journal has a strong clinical and pharmacological focus and is aimed at clinicians and researchers in drug safety, providing a forum in print and online for publishing the highest quality articles in this area. The editors welcome articles of current interest on research across all areas of drug safety, including therapeutic drug monitoring, pharmacoepidemiology, adverse drug reactions, drug interactions, pharmacokinetics, pharmacovigilance, medication/prescribing errors, risk management, ethics and regulation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信