GRL–PUL: predicting microbe–drug association based on graph representation learning and positive unlabeled learning

IF 2.4 4区生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular omics Pub Date : 2024-11-04 DOI:10.1039/D4MO00117F

Jinqing Liang, Yuping Sun and Jie Ling

{"title":"GRL–PUL: predicting microbe–drug association based on graph representation learning and positive unlabeled learning","authors":"Jinqing Liang, Yuping Sun and Jie Ling","doi":"10.1039/D4MO00117F","DOIUrl":null,"url":null,"abstract":"<p >Extensive research has confirmed the widespread presence of microorganisms in the human body and their crucial impact on human health, with drugs being an effective method of regulation. Hence it is essential to identify potential microbe–drug associations (MDAs). Owing to the limitations of wet experiments, such as high costs and long durations, computational methods for binary classification tasks have become valuable alternatives for traditional experimental approaches. Since validated negative MDAs are absent in existing datasets, most methods randomly sample negatives from unlabeled data, which evidently leads to false negative issues. In this manuscript, we propose a novel model based on graph representation learning and positive-unlabeled learning (GRL–PUL), to infer potential MDAs. Firstly, we screen reliable negative samples by applying weighted matrix factorization and the PU-bagging strategy on the known microbe–drug bipartite network. Then, we combine muti-model attributes and constructed a microbe–drug heterogeneous network. After that, graph attention auto-encoder module, an encoder combining graph convolutional networks and graph attention networks, is introduced to extract informative embeddings based on the microbe–drug heterogeneous network. Lastly, we adopt a modified random forest as the final classifier. Comparison experiments with five baseline models on three benchmark datasets show that our model surpasses other methods in terms of the AUC, AUPR, ACC, F1-score and MCC. Moreover, several case studies show that GRL–PUL could capably predict latent MDAs. Notably, we further verify the effectiveness of a reliable negative sample selection module by migrating it to other state-of-the-art models, and the experimental results demonstrate its ability to substantially improve their prediction performance.</p>","PeriodicalId":19065,"journal":{"name":"Molecular omics","volume":" 1","pages":" 38-50"},"PeriodicalIF":2.4000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular omics","FirstCategoryId":"99","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/mo/d4mo00117f","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Extensive research has confirmed the widespread presence of microorganisms in the human body and their crucial impact on human health, with drugs being an effective method of regulation. Hence it is essential to identify potential microbe–drug associations (MDAs). Owing to the limitations of wet experiments, such as high costs and long durations, computational methods for binary classification tasks have become valuable alternatives for traditional experimental approaches. Since validated negative MDAs are absent in existing datasets, most methods randomly sample negatives from unlabeled data, which evidently leads to false negative issues. In this manuscript, we propose a novel model based on graph representation learning and positive-unlabeled learning (GRL–PUL), to infer potential MDAs. Firstly, we screen reliable negative samples by applying weighted matrix factorization and the PU-bagging strategy on the known microbe–drug bipartite network. Then, we combine muti-model attributes and constructed a microbe–drug heterogeneous network. After that, graph attention auto-encoder module, an encoder combining graph convolutional networks and graph attention networks, is introduced to extract informative embeddings based on the microbe–drug heterogeneous network. Lastly, we adopt a modified random forest as the final classifier. Comparison experiments with five baseline models on three benchmark datasets show that our model surpasses other methods in terms of the AUC, AUPR, ACC, F1-score and MCC. Moreover, several case studies show that GRL–PUL could capably predict latent MDAs. Notably, we further verify the effectiveness of a reliable negative sample selection module by migrating it to other state-of-the-art models, and the experimental results demonstrate its ability to substantially improve their prediction performance.

Abstract Image

查看原文本刊更多论文

GRL-PUL：基于图表示学习和正向无标记学习预测微生物与药物的关联。

大量研究证实，微生物在人体内广泛存在，并对人类健康产生重要影响，而药物则是一种有效的调节方法。因此，确定潜在的微生物-药物关联（MDA）至关重要。由于湿法实验存在成本高、时间长等局限性，用于二元分类任务的计算方法已成为传统实验方法的重要替代方法。由于现有数据集中没有经过验证的阴性 MDA，大多数方法都是从无标记数据中随机抽取阴性样本，这显然会导致假阴性问题。在本手稿中，我们提出了一种基于图表示学习和正向无标记学习（GRL-PUL）的新型模型，用于推断潜在的 MDAs。首先，我们通过在已知的微生物-药物双方格网络上应用加权矩阵因式分解和 PU-bagging策略来筛选可靠的阴性样本。然后，结合多模型属性，构建微生物-药物异构网络。之后，我们引入图注意自动编码器模块，这是一种结合了图卷积网络和图注意网络的编码器，可基于微生物-药物异构网络提取信息嵌入。最后，我们采用改进的随机森林作为最终分类器。在三个基准数据集上与五个基线模型的对比实验表明，我们的模型在AUC、AUPR、ACC、F1-score和MCC方面都超过了其他方法。此外，一些案例研究表明，GRL-PUL 可以预测潜在的 MDA。值得注意的是，我们通过将可靠的负样本选择模块移植到其他最先进的模型中，进一步验证了该模块的有效性，实验结果表明该模块能够大幅提高这些模型的预测性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Molecular omics Biochemistry, Genetics and Molecular Biology-Biochemistry

CiteScore

5.40

自引率

3.40%

发文量

期刊介绍： Molecular Omics publishes high-quality research from across the -omics sciences. Topics include, but are not limited to: -omics studies to gain mechanistic insight into biological processes – for example, determining the mode of action of a drug or the basis of a particular phenotype, such as drought tolerance -omics studies for clinical applications with validation, such as finding biomarkers for diagnostics or potential new drug targets -omics studies looking at the sub-cellular make-up of cells – for example, the subcellular localisation of certain proteins or post-translational modifications or new imaging techniques -studies presenting new methods and tools to support omics studies, including new spectroscopic/chromatographic techniques, chip-based/array technologies and new classification/data analysis techniques. New methods should be proven and demonstrate an advance in the field. Molecular Omics only accepts articles of high importance and interest that provide significant new insight into important chemical or biological problems. This could be fundamental research that significantly increases understanding or research that demonstrates clear functional benefits. Papers reporting new results that could be routinely predicted, do not show a significant improvement over known research, or are of interest only to the specialist in the area are not suitable for publication in Molecular Omics.