Ensemble learning methods and heterogeneous graph network fusion: building drug-gene-disease triple association prediction models.

IF 7.7 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2025-07-02 DOI:10.1093/bib/bbaf369

Keichin N G

{"title":"Ensemble learning methods and heterogeneous graph network fusion: building drug-gene-disease triple association prediction models.","authors":"Keichin N G","doi":"10.1093/bib/bbaf369","DOIUrl":null,"url":null,"abstract":"<p><p>The potential association data between drugs, genes, and diseases is sparse and complex. Existing models find it difficult to effectively handle the problem of heterogeneous relationships and multi-source data fusion simultaneously, resulting in limited accuracy and generalization of association prediction. To address this problem, we propose a fusion method of relational graph convolutional network (R-GCN) and eXtreme Gradient Boosting (XGBoost). First, a heterogeneous graph containing drug, gene, and disease nodes and their relationships is constructed. The features of different types of nodes are aggregated and represented by R-GCN to generate high-quality node embeddings. Then, the embedded features of the drug-gene-disease triples are input into the XGBoost model for training to achieve the association prediction task. The findings demonstrate that the model's area under the curve reaches 0.92, and the F1 score reaches 0.85, indicating strong predictive ability. This method solves the problem of association prediction in complex biological networks and brings new technological support for precision medicine.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 4","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12286780/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf369","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The potential association data between drugs, genes, and diseases is sparse and complex. Existing models find it difficult to effectively handle the problem of heterogeneous relationships and multi-source data fusion simultaneously, resulting in limited accuracy and generalization of association prediction. To address this problem, we propose a fusion method of relational graph convolutional network (R-GCN) and eXtreme Gradient Boosting (XGBoost). First, a heterogeneous graph containing drug, gene, and disease nodes and their relationships is constructed. The features of different types of nodes are aggregated and represented by R-GCN to generate high-quality node embeddings. Then, the embedded features of the drug-gene-disease triples are input into the XGBoost model for training to achieve the association prediction task. The findings demonstrate that the model's area under the curve reaches 0.92, and the F1 score reaches 0.85, indicating strong predictive ability. This method solves the problem of association prediction in complex biological networks and brings new technological support for precision medicine.

Abstract Image

查看原文本刊更多论文

集成学习方法与异构图网络融合：构建药物-基因-疾病三重关联预测模型。

药物、基因和疾病之间的潜在关联数据既稀少又复杂。现有模型难以同时有效处理异构关系和多源数据融合问题，导致关联预测的准确性和泛化程度有限。为了解决这个问题，我们提出了一种关系图卷积网络（R-GCN）和极限梯度提升（XGBoost）的融合方法。首先，构建一个包含药物、基因和疾病节点及其关系的异构图。通过R-GCN对不同类型节点的特征进行聚合和表示，生成高质量的节点嵌入。然后，将药物-基因-疾病三元组的嵌入特征输入到XGBoost模型中进行训练，实现关联预测任务。结果表明，模型的曲线下面积达到0.92，F1得分达到0.85，具有较强的预测能力。该方法解决了复杂生物网络中的关联预测问题，为精准医疗提供了新的技术支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.