{"title":"DA-HGL: a domain-augmented heterogeneous graph learning framework for protein function prediction.","authors":"Sai Hu, Wei Zhang, Bihai Zhao","doi":"10.1093/bib/bbaf511","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate protein function prediction is critical for deciphering disease mechanisms and advancing precision medicine, yet remains challenging for proteins with sparse annotations. Traditional methods struggle with annotation sparsity and fail to integrate multimodal data holistically. We propose DA-HGL, a heterogeneous graph learning framework that integrates protein sequences, domain architectures, and Gene Ontology (GO) hierarchies through a multilayered graph and non-negative matrix factorization with dual biological constraints. DA-HGL uniquely models domain-function coherence, GO semantic consistency, and topological congruence. Evaluated on yeast and human proteomes, DA-HGL achieves Fmax gains of 9.0% (yeast CC) and 17.2% (human BP) over state-of-the-art methods. By dynamically learning domain-context associations and resolving annotation sparsity, DA-HGL excels in cold-start scenarios and disease-specific predictions (e.g. Parkinson's \"ubiquitin-dependent catabolism\"). This framework offers a robust tool for accelerating functional genomics and precision medicine. Code/data: https://github.com/husaiccsu/DA-HGL.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 5","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12476837/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbaf511","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate protein function prediction is critical for deciphering disease mechanisms and advancing precision medicine, yet remains challenging for proteins with sparse annotations. Traditional methods struggle with annotation sparsity and fail to integrate multimodal data holistically. We propose DA-HGL, a heterogeneous graph learning framework that integrates protein sequences, domain architectures, and Gene Ontology (GO) hierarchies through a multilayered graph and non-negative matrix factorization with dual biological constraints. DA-HGL uniquely models domain-function coherence, GO semantic consistency, and topological congruence. Evaluated on yeast and human proteomes, DA-HGL achieves Fmax gains of 9.0% (yeast CC) and 17.2% (human BP) over state-of-the-art methods. By dynamically learning domain-context associations and resolving annotation sparsity, DA-HGL excels in cold-start scenarios and disease-specific predictions (e.g. Parkinson's "ubiquitin-dependent catabolism"). This framework offers a robust tool for accelerating functional genomics and precision medicine. Code/data: https://github.com/husaiccsu/DA-HGL.
期刊介绍:
Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data.
The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.