Yanbu Guo , Quanming Guo , Shengli Song , Yihan Wang , Jinde Cao
{"title":"Heterogeneous graph collaborative representation learning for drug-related microbe prediction with attentive fusion and reciprocal distillation","authors":"Yanbu Guo , Quanming Guo , Shengli Song , Yihan Wang , Jinde Cao","doi":"10.1016/j.knosys.2025.114548","DOIUrl":null,"url":null,"abstract":"<div><div>Microbes are microorganisms with biological molecules and have significant therapeutic potential for treating diseases, underscoring the need for computational methods to screen microbes targeting disease-associated drugs. However, the computational methods often consider node embedding or structure features between microbes and drugs, and have a severe class imbalance problem inherent in sparse association data. In this work, we proposed a heterogeneous graph collaborative representation learning model that combines the merits of attentive fusion and reciprocal distillation for drug-related microbe prediction. First, we constructed the heterogeneous biological information and meta-path-induced graphs of microbes and drugs. Then, a topological structure feature encoder is devised to extract complex topological and semantic interaction patterns from heterogeneous biological graphs with microbes and drugs, while an efficient transformer concurrently extracts discriminative semantic and structural information based on the graph position information of nodes. Next, a reciprocal distillation schema is developed to mitigate the adverse effects of the data imbalance problem, and enable the distribution consistency of the model between topological and semantic information extraction. Moreover, we devised a dual collaborative feature fusion schema that combines graph topological and dual meta-path-based semantic features to obtain the discriminative features of microbes and drugs. Through reciprocal distillation, an efficient optimization function focuses on hard-to-classify samples of drug-related microbes via discriminative features. Extensive experiments demonstrate that our model could deal with the association sparsity problem and extract more semantics and structure. Meanwhile, case studies indicate that our model could discover reliable candidate microbes associated with a special drug.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114548"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015874","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Microbes are microorganisms with biological molecules and have significant therapeutic potential for treating diseases, underscoring the need for computational methods to screen microbes targeting disease-associated drugs. However, the computational methods often consider node embedding or structure features between microbes and drugs, and have a severe class imbalance problem inherent in sparse association data. In this work, we proposed a heterogeneous graph collaborative representation learning model that combines the merits of attentive fusion and reciprocal distillation for drug-related microbe prediction. First, we constructed the heterogeneous biological information and meta-path-induced graphs of microbes and drugs. Then, a topological structure feature encoder is devised to extract complex topological and semantic interaction patterns from heterogeneous biological graphs with microbes and drugs, while an efficient transformer concurrently extracts discriminative semantic and structural information based on the graph position information of nodes. Next, a reciprocal distillation schema is developed to mitigate the adverse effects of the data imbalance problem, and enable the distribution consistency of the model between topological and semantic information extraction. Moreover, we devised a dual collaborative feature fusion schema that combines graph topological and dual meta-path-based semantic features to obtain the discriminative features of microbes and drugs. Through reciprocal distillation, an efficient optimization function focuses on hard-to-classify samples of drug-related microbes via discriminative features. Extensive experiments demonstrate that our model could deal with the association sparsity problem and extract more semantics and structure. Meanwhile, case studies indicate that our model could discover reliable candidate microbes associated with a special drug.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.