AI-driven pharmacovigilance: Enhancing adverse drug reaction detection with deep learning and NLP

IF 1.6 Q2 MULTIDISCIPLINARY SCIENCES

MethodsX Pub Date : 2025-06-23 DOI:10.1016/j.mex.2025.103460

Dr. Bharti Khemani , Dr. Sachin Malave , Samyukta Shinde , Mandvi Shukla , Razzaq Shikalgar , Harshita Talwar

{"title":"AI-driven pharmacovigilance: Enhancing adverse drug reaction detection with deep learning and NLP","authors":"Dr. Bharti Khemani , Dr. Sachin Malave , Samyukta Shinde , Mandvi Shukla , Razzaq Shikalgar , Harshita Talwar","doi":"10.1016/j.mex.2025.103460","DOIUrl":null,"url":null,"abstract":"<div><div>In the healthcare industry, the ever-increasing volume of clinical trial data presents challenges for ensuring drug safety and detecting adverse drug reactions (ADRs). This study aims to address the challenge of accurately detecting Serious Adverse Events (SAEs) in pharmacovigilance, a critical component in ensuring drug safety during and after clinical trials. The key problem lies in the underreporting and delayed detection of Adverse Drug Reactions (ADRs) due to the heterogeneous nature of medical data, class imbalance, and the limited scope of traditional monitoring techniques. This study proposes a hybrid AI-driven framework that integrates structured (e.g., patient demographics, lab results) and unstructured data (e.g., clinical notes) to detect ADRs using advanced deep learning and NLP methods. The objective is to outperform traditional signal detection methods and provide interpretable predictions to aid clinicians in real-time. By leveraging advanced Machine Learning (ML) and Deep Learning (DL) techniques, including Random Forests, Gradient Boosting Machines, and Convolutional Neural Networks (CNNs), our model aims to identify potential ADRs across different patient subgroups. Through meticulous feature engineering and the application of techniques to address data imbalance, our model demonstrates improved accuracy and interpretability in predicting ADRs. The CNN model achieved an accuracy of 85 %, outperforming traditional models, such as Logistic Regression (78 %) and Support Vector Machines (80 %). These findings suggest that specific demographic and clinical factors significantly influence the likelihood of adverse reactions, offering valuable insights for targeted monitoring and risk mitigation strategies[11]. This research underscores the potential of predictive modeling to enhance pharmacovigilance efforts and ensure safer clinical trial outcomes.<ul><li><span>•</span><span><div>The research methodology includes a comparison of supervised learning algorithms, such as Logistic Regression, Random Forest, Gradient Boost, CNN, and genetic algorithms, to identify patterns and anomalies in clinical trial data. BERT and GPT, were also employed to provide the functionality of textual interactions over medical data.</div></span></li><li><span>•</span><span><div>Performance metrics such as accuracy, precision, recall, and F1-score were systematically applied to evaluate each model’s performance. Among the models tested, the CNN model with BERT achieved the highest accuracy, providing valuable insights into the potential of deep learning for enhancing pharmacovigilance practices.</div></span></li><li><span>•</span><span><div>These findings suggest that an inclusion of diverse clinical data when supplied to advanced ML and NLP techniques can significantly improve the detection of ADRs, leading to better alignment with the fundamental principles of Good Clinical Practice (GCP).</div></span></li></ul></div></div>","PeriodicalId":18446,"journal":{"name":"MethodsX","volume":"15 ","pages":"Article 103460"},"PeriodicalIF":1.6000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MethodsX","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221501612500305X","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

In the healthcare industry, the ever-increasing volume of clinical trial data presents challenges for ensuring drug safety and detecting adverse drug reactions (ADRs). This study aims to address the challenge of accurately detecting Serious Adverse Events (SAEs) in pharmacovigilance, a critical component in ensuring drug safety during and after clinical trials. The key problem lies in the underreporting and delayed detection of Adverse Drug Reactions (ADRs) due to the heterogeneous nature of medical data, class imbalance, and the limited scope of traditional monitoring techniques. This study proposes a hybrid AI-driven framework that integrates structured (e.g., patient demographics, lab results) and unstructured data (e.g., clinical notes) to detect ADRs using advanced deep learning and NLP methods. The objective is to outperform traditional signal detection methods and provide interpretable predictions to aid clinicians in real-time. By leveraging advanced Machine Learning (ML) and Deep Learning (DL) techniques, including Random Forests, Gradient Boosting Machines, and Convolutional Neural Networks (CNNs), our model aims to identify potential ADRs across different patient subgroups. Through meticulous feature engineering and the application of techniques to address data imbalance, our model demonstrates improved accuracy and interpretability in predicting ADRs. The CNN model achieved an accuracy of 85 %, outperforming traditional models, such as Logistic Regression (78 %) and Support Vector Machines (80 %). These findings suggest that specific demographic and clinical factors significantly influence the likelihood of adverse reactions, offering valuable insights for targeted monitoring and risk mitigation strategies[11]. This research underscores the potential of predictive modeling to enhance pharmacovigilance efforts and ensure safer clinical trial outcomes.

•
The research methodology includes a comparison of supervised learning algorithms, such as Logistic Regression, Random Forest, Gradient Boost, CNN, and genetic algorithms, to identify patterns and anomalies in clinical trial data. BERT and GPT, were also employed to provide the functionality of textual interactions over medical data.
•
Performance metrics such as accuracy, precision, recall, and F1-score were systematically applied to evaluate each model’s performance. Among the models tested, the CNN model with BERT achieved the highest accuracy, providing valuable insights into the potential of deep learning for enhancing pharmacovigilance practices.
•
These findings suggest that an inclusion of diverse clinical data when supplied to advanced ML and NLP techniques can significantly improve the detection of ADRs, leading to better alignment with the fundamental principles of Good Clinical Practice (GCP).

Abstract Image

查看原文本刊更多论文

人工智能驱动的药物警戒：利用深度学习和NLP增强药物不良反应检测

在医疗保健行业，不断增加的临床试验数据量为确保药物安全性和检测药物不良反应（adr）提出了挑战。本研究旨在解决在药物警戒中准确检测严重不良事件（sae）的挑战，这是确保临床试验期间和之后药物安全的关键组成部分。关键问题在于，由于医疗数据的异质性、类别的不平衡以及传统监测技术的范围有限，导致药物不良反应（adr）的少报和延迟检测。本研究提出了一个混合人工智能驱动的框架，该框架集成了结构化（例如，患者人口统计数据、实验室结果）和非结构化数据（例如，临床记录），使用先进的深度学习和NLP方法检测adr。目标是超越传统的信号检测方法，并提供可解释的预测，以帮助临床医生实时。通过利用先进的机器学习（ML）和深度学习（DL）技术，包括随机森林、梯度增强机和卷积神经网络（cnn），我们的模型旨在识别不同患者亚组的潜在不良反应。通过细致的特征工程和应用技术来解决数据不平衡问题，我们的模型在预测adr方面证明了更高的准确性和可解释性。CNN模型达到了85%的准确率，优于传统模型，如逻辑回归（78%）和支持向量机（80%）。这些发现表明，特定的人口统计学和临床因素显著影响不良反应的可能性，为有针对性的监测和风险缓解策略提供了有价值的见解。这项研究强调了预测模型在加强药物警戒和确保更安全的临床试验结果方面的潜力。•研究方法包括监督学习算法的比较，如逻辑回归、随机森林、梯度增强、CNN和遗传算法，以识别临床试验数据中的模式和异常。BERT和GPT也被用于提供医疗数据的文本交互功能。•系统地应用准确性、精密度、召回率和f1分数等性能指标来评估每个模型的性能。在测试的模型中，带有BERT的CNN模型达到了最高的准确性，为深度学习增强药物警戒实践的潜力提供了有价值的见解。•这些发现表明，当提供给高级ML和NLP技术时，包含不同的临床数据可以显着提高adr的检测，从而更好地符合良好临床实践（GCP）的基本原则。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊