A novel weighted pseudo-labeling framework based on matrix factorization for adverse drug reaction prediction.

IF 2.9 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-02-17 DOI:10.1186/s12859-025-06053-z

Junheng Chen, Fangfang Han, Mingxiu He, Yiyang Shi, Yongming Cai

{"title":"A novel weighted pseudo-labeling framework based on matrix factorization for adverse drug reaction prediction.","authors":"Junheng Chen, Fangfang Han, Mingxiu He, Yiyang Shi, Yongming Cai","doi":"10.1186/s12859-025-06053-z","DOIUrl":null,"url":null,"abstract":"<p><p>Adverse drug reactions (ADRs) are among the global public health events that seriously endanger human life and cause high economic burdens. Therefore, predicting the possibility of their occurrence and taking early and effective response measures is of great significance. Constructing a correlation matrix between drugs and their adverse reactions, followed by effective correlation data mining, is one of the current strategies to predict ADRs using accessible public data. Since the number of known ADRs in real-world data is far less than the number of their unknown counterparts, the drug-ADR association matrix is very sparse, which greatly affects the classification performance of machine learning methods. To effectively address the problem of sparsity, we proposed a novel weighted pseudo-labeling framework that mines potential unknown drug-ADR pairs by integrating multiple weighted matrix factorization (MF) models and treating them as pseudo-labeled drug-ADR pairs. Pseudo-labeled data is added to the training set, and the MF model is fine-tuned to improve the classification performance. To prevent overfitting to easily found pseudo-labels and improve the quality of pseudo-labels, a novel weighting approach for pseudo-labels was adopted. This paper reproduces the baselines under the same experimental conditions to evaluate the performance of the proposed method on sparse data from the Side Effect Resource (SIDER) database. Experimental results showed that our method outperformed other baselines in the Area Under Precision-Recall and F1-scores and still maintained the best performance in sparser scenarios. Furthermore, we conducted a case study, and the results showed that our proposed framework efficiently predicted ADRs in the real world.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"54"},"PeriodicalIF":2.9000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11831795/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06053-z","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Adverse drug reactions (ADRs) are among the global public health events that seriously endanger human life and cause high economic burdens. Therefore, predicting the possibility of their occurrence and taking early and effective response measures is of great significance. Constructing a correlation matrix between drugs and their adverse reactions, followed by effective correlation data mining, is one of the current strategies to predict ADRs using accessible public data. Since the number of known ADRs in real-world data is far less than the number of their unknown counterparts, the drug-ADR association matrix is very sparse, which greatly affects the classification performance of machine learning methods. To effectively address the problem of sparsity, we proposed a novel weighted pseudo-labeling framework that mines potential unknown drug-ADR pairs by integrating multiple weighted matrix factorization (MF) models and treating them as pseudo-labeled drug-ADR pairs. Pseudo-labeled data is added to the training set, and the MF model is fine-tuned to improve the classification performance. To prevent overfitting to easily found pseudo-labels and improve the quality of pseudo-labels, a novel weighting approach for pseudo-labels was adopted. This paper reproduces the baselines under the same experimental conditions to evaluate the performance of the proposed method on sparse data from the Side Effect Resource (SIDER) database. Experimental results showed that our method outperformed other baselines in the Area Under Precision-Recall and F1-scores and still maintained the best performance in sparser scenarios. Furthermore, we conducted a case study, and the results showed that our proposed framework efficiently predicted ADRs in the real world.

查看原文本刊更多论文

基于矩阵分解的药物不良反应预测加权伪标记框架。

药物不良反应（adr）是严重危害人类生命和造成巨大经济负担的全球性公共卫生事件之一。因此，预测其发生的可能性，及早采取有效的应对措施具有重要意义。构建药物及其不良反应之间的相关矩阵，然后进行有效的相关数据挖掘，是目前利用可访问的公共数据预测adr的策略之一。由于真实数据中已知adr的数量远远少于未知adr的数量，因此药物- adr关联矩阵非常稀疏，这极大地影响了机器学习方法的分类性能。为了有效地解决稀疏性问题，我们提出了一种新的加权伪标记框架，该框架通过整合多个加权矩阵分解（MF）模型并将其作为伪标记药物- adr对来挖掘潜在的未知药物- adr对。将伪标记数据添加到训练集中，并对MF模型进行微调以提高分类性能。为了防止易发现伪标签的过拟合，提高伪标签的质量，采用了一种新的伪标签加权方法。本文在相同的实验条件下再现了基线，以评估所提出的方法在Side Effect Resource （SIDER）数据库稀疏数据上的性能。实验结果表明，该方法在Precision-Recall和f1得分下的表现优于其他基线，并且在稀疏场景下仍然保持最佳性能。此外，我们进行了一个案例研究，结果表明我们提出的框架有效地预测了现实世界中的adr。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.