{"title":"MGDRGCN: A novel framework for predicting metabolite–disease connections using tripartite network and relational graph convolutional network","authors":"Pengli Lu, Ling Li","doi":"10.1016/j.jocs.2024.102477","DOIUrl":null,"url":null,"abstract":"<div><div>Metabolites are fundamental to the existence of biomolecules, and numerous studies have demonstrated that uncovering the connections between metabolites and diseases can enhance our understanding of disease pathogenesis. Traditional biological methods can identify potential metabolite–disease relationships, but these approaches often require significant human and material resources. Consequently, computational methods have emerged as a more efficient alternative. However, most computational methods primarily rely on metabolite–disease associations and rarely explore the impact of more biological entities. To address this issue, we propose a novel computational framework based on a metabolite–gene–disease tripartite heterogeneous network and relational graph convolutional network (R-GCN), abbreviated as MGDRGCN. Specifically, we construct three types of similarity networks from multiple data sources, including metabolite and gene functional similarity, disease semantic similarity and Gaussian interaction profile kernel similarity for metabolites and diseases. Next, we use principal component analysis to further extract features and construct a tripartite heterogeneous network with genes as the bridge. This network structure comprehensively captures and represents the complex relationships among metabolites, genes and diseases. We employ R-GCN to extract higher-order information from the tripartite heterogeneous network. Finally, we input the embeddings learned from R-GCN into a residual network classifier to predict potential metabolite–disease associations. In five-fold cross-validation experiments, MGDRGCN exhibit outstanding performance, with both AUC (0.9866) and AUPR (0.9865) significantly surpassing other advanced methods. Additionally, case studies further demonstrate MGDRGCN’s superior performance in predicting metabolite–disease associations. Overall, the introduction of MGDRGCN provides new perspectives and methods for future biomedical research, offering promising potential for uncovering the mechanisms of complex biological systems.</div></div>","PeriodicalId":48907,"journal":{"name":"Journal of Computational Science","volume":"85 ","pages":"Article 102477"},"PeriodicalIF":3.1000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Science","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877750324002709","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Metabolites are fundamental to the existence of biomolecules, and numerous studies have demonstrated that uncovering the connections between metabolites and diseases can enhance our understanding of disease pathogenesis. Traditional biological methods can identify potential metabolite–disease relationships, but these approaches often require significant human and material resources. Consequently, computational methods have emerged as a more efficient alternative. However, most computational methods primarily rely on metabolite–disease associations and rarely explore the impact of more biological entities. To address this issue, we propose a novel computational framework based on a metabolite–gene–disease tripartite heterogeneous network and relational graph convolutional network (R-GCN), abbreviated as MGDRGCN. Specifically, we construct three types of similarity networks from multiple data sources, including metabolite and gene functional similarity, disease semantic similarity and Gaussian interaction profile kernel similarity for metabolites and diseases. Next, we use principal component analysis to further extract features and construct a tripartite heterogeneous network with genes as the bridge. This network structure comprehensively captures and represents the complex relationships among metabolites, genes and diseases. We employ R-GCN to extract higher-order information from the tripartite heterogeneous network. Finally, we input the embeddings learned from R-GCN into a residual network classifier to predict potential metabolite–disease associations. In five-fold cross-validation experiments, MGDRGCN exhibit outstanding performance, with both AUC (0.9866) and AUPR (0.9865) significantly surpassing other advanced methods. Additionally, case studies further demonstrate MGDRGCN’s superior performance in predicting metabolite–disease associations. Overall, the introduction of MGDRGCN provides new perspectives and methods for future biomedical research, offering promising potential for uncovering the mechanisms of complex biological systems.
期刊介绍:
Computational Science is a rapidly growing multi- and interdisciplinary field that uses advanced computing and data analysis to understand and solve complex problems. It has reached a level of predictive capability that now firmly complements the traditional pillars of experimentation and theory.
The recent advances in experimental techniques such as detectors, on-line sensor networks and high-resolution imaging techniques, have opened up new windows into physical and biological processes at many levels of detail. The resulting data explosion allows for detailed data driven modeling and simulation.
This new discipline in science combines computational thinking, modern computational methods, devices and collateral technologies to address problems far beyond the scope of traditional numerical methods.
Computational science typically unifies three distinct elements:
• Modeling, Algorithms and Simulations (e.g. numerical and non-numerical, discrete and continuous);
• Software developed to solve science (e.g., biological, physical, and social), engineering, medicine, and humanities problems;
• Computer and information science that develops and optimizes the advanced system hardware, software, networking, and data management components (e.g. problem solving environments).