Can large language models detect drug-drug interactions leading to adverse drug reactions?

IF 3.4 3区医学 Q2 PHARMACOLOGY & PHARMACY

Therapeutic Advances in Drug Safety Pub Date : 2025-05-16 eCollection Date: 2025-01-01 DOI:10.1177/20420986251339358

Justine Sicard, François Montastruc, Coline Achalme, Annie Pierre Jonville-Bera, Paul Songue, Marina Babin, Thomas Soeiro, Pauline Schiro, Claire de Canecaude, Romain Barus

{"title":"Can large language models detect drug-drug interactions leading to adverse drug reactions?","authors":"Justine Sicard, François Montastruc, Coline Achalme, Annie Pierre Jonville-Bera, Paul Songue, Marina Babin, Thomas Soeiro, Pauline Schiro, Claire de Canecaude, Romain Barus","doi":"10.1177/20420986251339358","DOIUrl":null,"url":null,"abstract":"Background: Drug-drug interactions (DDI) are an important cause of adverse drug reactions (ADRs). Could large language models (LLMs) serve as valuable tools for pharmacovigilance specialists in detecting DDIs that lead to ADR notifications?Objective: To compare the performance of three LLMs (ChatGPT, Gemini, and Claude) in detecting and explaining clinically significant DDIs that have led to an ADR.Design: Observational cross-sectional study.Methods: We used the French National Pharmacovigilance Database to randomly extract Individual Case Safety Reports (ICSRs) of ADRs with DDI (positive controls) and ICSRs of ADRs without DDI (negative controls) registered in 2022. Interaction cases were classified by difficulty level (level-1 DDI being the easiest and level-2 DDI being the most difficult). We give each LLM (ChatGPT, Gemini, and Claude) the same prompt and case summary. Sensitivity, specificity, and F-measure were calculated for each LLM in detecting DDIs in the case summaries.Results: We assessed 82 ICSRs with DDIs and 22 ICSRs without DDIs. Among ICSRs with DDIs, 37 involved level-1 DDIs, and 45 involved level-2 DDIs. Correct responses were more frequent for level-1 DDIs than for level-2 DDIs. Regardless of difficulty level, ChatGPT detected 99% of DDI cases, and Claude and Gemini detected 95%. The percentage of correct answers to all DDI-related questions was 66% for ChatGPT, 68% for Claude, and 33% for Gemini. ChatGPT and Claude produced comparable results and outperformed Gemini (F-measure between 0.83 and 0.85 for ChatGPT and Claude and 0.63-0.68 for Gemini) to detect drugs involved in DDI. All exhibited low specificity (ChatGPT 0.68, Claude 0.64, and Gemini 0.36) and reported nonexistent DDIs for negative controls.Conclusion: LLMs can detect DDIs leading to pharmacovigilance cases, but cannot reliably exclude DDIs in cases without interactions. Pharmacologists are crucial for assessing whether a DDI is implicated in an ADR.","PeriodicalId":23012,"journal":{"name":"Therapeutic Advances in Drug Safety","volume":"16 ","pages":"20420986251339358"},"PeriodicalIF":3.4000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12084699/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Therapeutic Advances in Drug Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20420986251339358","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"PHARMACOLOGY & PHARMACY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Drug-drug interactions (DDI) are an important cause of adverse drug reactions (ADRs). Could large language models (LLMs) serve as valuable tools for pharmacovigilance specialists in detecting DDIs that lead to ADR notifications?

Objective: To compare the performance of three LLMs (ChatGPT, Gemini, and Claude) in detecting and explaining clinically significant DDIs that have led to an ADR.

Design: Observational cross-sectional study.

Methods: We used the French National Pharmacovigilance Database to randomly extract Individual Case Safety Reports (ICSRs) of ADRs with DDI (positive controls) and ICSRs of ADRs without DDI (negative controls) registered in 2022. Interaction cases were classified by difficulty level (level-1 DDI being the easiest and level-2 DDI being the most difficult). We give each LLM (ChatGPT, Gemini, and Claude) the same prompt and case summary. Sensitivity, specificity, and F-measure were calculated for each LLM in detecting DDIs in the case summaries.

Results: We assessed 82 ICSRs with DDIs and 22 ICSRs without DDIs. Among ICSRs with DDIs, 37 involved level-1 DDIs, and 45 involved level-2 DDIs. Correct responses were more frequent for level-1 DDIs than for level-2 DDIs. Regardless of difficulty level, ChatGPT detected 99% of DDI cases, and Claude and Gemini detected 95%. The percentage of correct answers to all DDI-related questions was 66% for ChatGPT, 68% for Claude, and 33% for Gemini. ChatGPT and Claude produced comparable results and outperformed Gemini (F-measure between 0.83 and 0.85 for ChatGPT and Claude and 0.63-0.68 for Gemini) to detect drugs involved in DDI. All exhibited low specificity (ChatGPT 0.68, Claude 0.64, and Gemini 0.36) and reported nonexistent DDIs for negative controls.

Conclusion: LLMs can detect DDIs leading to pharmacovigilance cases, but cannot reliably exclude DDIs in cases without interactions. Pharmacologists are crucial for assessing whether a DDI is implicated in an ADR.

Abstract Image

查看原文本刊更多论文

大型语言模型能检测到导致药物不良反应的药物-药物相互作用吗？

背景：药物-药物相互作用（DDI）是引起药物不良反应（adr）的重要原因。大型语言模型（LLMs）能否作为药物警戒专家检测导致ADR通知的ddi的有价值的工具？目的：比较三种LLMs （ChatGPT、Gemini和Claude）在检测和解释导致不良反应的临床显著ddi方面的表现。设计：观察性横断面研究。方法：使用法国国家药物警戒数据库，随机抽取2022年登记的DDI（阳性对照）和不DDI（阴性对照）adr的个案安全报告（icsr）。交互案例按难度等级进行分类（1级DDI最简单，2级DDI最难）。我们给每个LLM （ChatGPT， Gemini和Claude）提供相同的提示和案例摘要。计算每个LLM在病例总结中检测ddi的敏感性、特异性和F-measure。结果：我们评估了82例有ddi的icsr和22例没有ddi的icsr。合并ddi的icsr中，一级ddi 37例，二级ddi 45例。1级ddi的正确应答频率高于2级ddi。无论难度如何，ChatGPT检测到99%的DDI病例，Claude和Gemini检测到95%。ChatGPT对所有与ddi相关的问题的正确答案百分比为66%，Claude为68%，Gemini为33%。ChatGPT和Claude产生了类似的结果，并且在检测DDI涉及的药物方面优于Gemini （ChatGPT和Claude的f测量值在0.83和0.85之间，Gemini的f测量值在0.63-0.68之间）。所有患者均表现出低特异性（ChatGPT为0.68，Claude为0.64，Gemini为0.36），阴性对照无ddi。结论：LLMs可以检测到导致药物警戒病例的ddi，但不能可靠地排除无相互作用病例的ddi。药理学家对于评估DDI是否与不良反应有关至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Therapeutic Advances in Drug Safety Medicine-Pharmacology (medical)

CiteScore

6.70

自引率

4.50%

发文量

审稿时长

9 weeks

期刊介绍： Therapeutic Advances in Drug Safety delivers the highest quality peer-reviewed articles, reviews, and scholarly comment on pioneering efforts and innovative studies pertaining to the safe use of drugs in patients. The journal has a strong clinical and pharmacological focus and is aimed at clinicians and researchers in drug safety, providing a forum in print and online for publishing the highest quality articles in this area. The editors welcome articles of current interest on research across all areas of drug safety, including therapeutic drug monitoring, pharmacoepidemiology, adverse drug reactions, drug interactions, pharmacokinetics, pharmacovigilance, medication/prescribing errors, risk management, ethics and regulation.