{"title":"一种增强复杂微生物群落基因组恢复的图对比学习方法。","authors":"Guo Wei, Yan Liu","doi":"10.3390/e27090921","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate genome binning is essential for resolving microbial community structure and functional potential from metagenomic data. However, existing approaches-primarily reliant on tetranucleotide frequency (TNF) and abundance profiles-often perform sub-optimally in the face of complex community compositions, low-abundance taxa, and long-read sequencing datasets. To address these limitations, we present MBGCCA, a novel metagenomic binning framework that synergistically integrates graph neural networks (GNNs), contrastive learning, and information-theoretic regularization to enhance binning accuracy, robustness, and biological coherence. MBGCCA operates in two stages: (1) multimodal information integration, where TNF and abundance profiles are fused via a deep neural network trained using a multi-view contrastive loss, and (2) self-supervised graph representation learning, which leverages assembly graph topology to refine contig embeddings. The contrastive learning objective follows the InfoMax principle by maximizing mutual information across augmented views and modalities, encouraging the model to extract globally consistent and high-information representations. By aligning perturbed graph views while preserving topological structure, MBGCCA effectively captures both global genomic characteristics and local contig relationships. Comprehensive evaluations using both synthetic and real-world datasets-including wastewater and soil microbiomes-demonstrate that MBGCCA consistently outperforms state-of-the-art binning methods, particularly in challenging scenarios marked by sparse data and high community complexity. These results highlight the value of entropy-aware, topology-preserving learning for advancing metagenomic genome reconstruction.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 9","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468353/pdf/","citationCount":"0","resultStr":"{\"title\":\"A Graph Contrastive Learning Method for Enhancing Genome Recovery in Complex Microbial Communities.\",\"authors\":\"Guo Wei, Yan Liu\",\"doi\":\"10.3390/e27090921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Accurate genome binning is essential for resolving microbial community structure and functional potential from metagenomic data. However, existing approaches-primarily reliant on tetranucleotide frequency (TNF) and abundance profiles-often perform sub-optimally in the face of complex community compositions, low-abundance taxa, and long-read sequencing datasets. To address these limitations, we present MBGCCA, a novel metagenomic binning framework that synergistically integrates graph neural networks (GNNs), contrastive learning, and information-theoretic regularization to enhance binning accuracy, robustness, and biological coherence. MBGCCA operates in two stages: (1) multimodal information integration, where TNF and abundance profiles are fused via a deep neural network trained using a multi-view contrastive loss, and (2) self-supervised graph representation learning, which leverages assembly graph topology to refine contig embeddings. The contrastive learning objective follows the InfoMax principle by maximizing mutual information across augmented views and modalities, encouraging the model to extract globally consistent and high-information representations. By aligning perturbed graph views while preserving topological structure, MBGCCA effectively captures both global genomic characteristics and local contig relationships. Comprehensive evaluations using both synthetic and real-world datasets-including wastewater and soil microbiomes-demonstrate that MBGCCA consistently outperforms state-of-the-art binning methods, particularly in challenging scenarios marked by sparse data and high community complexity. These results highlight the value of entropy-aware, topology-preserving learning for advancing metagenomic genome reconstruction.</p>\",\"PeriodicalId\":11694,\"journal\":{\"name\":\"Entropy\",\"volume\":\"27 9\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-08-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12468353/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Entropy\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.3390/e27090921\",\"RegionNum\":3,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"PHYSICS, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27090921","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
A Graph Contrastive Learning Method for Enhancing Genome Recovery in Complex Microbial Communities.
Accurate genome binning is essential for resolving microbial community structure and functional potential from metagenomic data. However, existing approaches-primarily reliant on tetranucleotide frequency (TNF) and abundance profiles-often perform sub-optimally in the face of complex community compositions, low-abundance taxa, and long-read sequencing datasets. To address these limitations, we present MBGCCA, a novel metagenomic binning framework that synergistically integrates graph neural networks (GNNs), contrastive learning, and information-theoretic regularization to enhance binning accuracy, robustness, and biological coherence. MBGCCA operates in two stages: (1) multimodal information integration, where TNF and abundance profiles are fused via a deep neural network trained using a multi-view contrastive loss, and (2) self-supervised graph representation learning, which leverages assembly graph topology to refine contig embeddings. The contrastive learning objective follows the InfoMax principle by maximizing mutual information across augmented views and modalities, encouraging the model to extract globally consistent and high-information representations. By aligning perturbed graph views while preserving topological structure, MBGCCA effectively captures both global genomic characteristics and local contig relationships. Comprehensive evaluations using both synthetic and real-world datasets-including wastewater and soil microbiomes-demonstrate that MBGCCA consistently outperforms state-of-the-art binning methods, particularly in challenging scenarios marked by sparse data and high community complexity. These results highlight the value of entropy-aware, topology-preserving learning for advancing metagenomic genome reconstruction.
期刊介绍:
Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.