Apurva Narechania, Dean Bobo, Rob DeSalle, Barun Mathema, Barry Kreiswirth, Paul J Planet
{"title":"What do we gain when tolerating loss? The information bottleneck wrings out recombination.","authors":"Apurva Narechania, Dean Bobo, Rob DeSalle, Barun Mathema, Barry Kreiswirth, Paul J Planet","doi":"10.1093/molbev/msaf029","DOIUrl":null,"url":null,"abstract":"<p><p>Most microbes have the capacity to acquire genetic material from their environment. Recombination of foreign DNA yields genomes that are, at least in part, incongruent with the vertical history of their species. Dominant approaches for detecting these transfers are phylogenetic, requiring a painstaking series of analyses including alignment and tree reconstruction. But these methods do not scale. Here we propose an unsupervised, alignment-free and tree-free technique based on the sequential information bottleneck (SIB), an optimization procedure designed to extract some portion of relevant information from one random variable conditioned on another. In our case, this joint probability distribution tabulates occurrence counts of k-mers against their genomes of origin with the expectation that recombination will create a strong signal that unifies certain sets of co-occuring k-mers. We conceptualize the technique as a rate-distortion problem, measuring distortion in the relevance information as k-mers are compressed into clusters based on their co-occurrence in the source genomes. The result is fast, model-free, lossy compression of k-mers into learned groups of shared genome sequence, differentiating recombined elements from the vertically inherited core. We show that the technique yields a new recombination measure based purely on information, divorced from any biases and limitations inherent to alignment and phylogeny.</p>","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf029","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Most microbes have the capacity to acquire genetic material from their environment. Recombination of foreign DNA yields genomes that are, at least in part, incongruent with the vertical history of their species. Dominant approaches for detecting these transfers are phylogenetic, requiring a painstaking series of analyses including alignment and tree reconstruction. But these methods do not scale. Here we propose an unsupervised, alignment-free and tree-free technique based on the sequential information bottleneck (SIB), an optimization procedure designed to extract some portion of relevant information from one random variable conditioned on another. In our case, this joint probability distribution tabulates occurrence counts of k-mers against their genomes of origin with the expectation that recombination will create a strong signal that unifies certain sets of co-occuring k-mers. We conceptualize the technique as a rate-distortion problem, measuring distortion in the relevance information as k-mers are compressed into clusters based on their co-occurrence in the source genomes. The result is fast, model-free, lossy compression of k-mers into learned groups of shared genome sequence, differentiating recombined elements from the vertically inherited core. We show that the technique yields a new recombination measure based purely on information, divorced from any biases and limitations inherent to alignment and phylogeny.
期刊介绍:
Molecular Biology and Evolution
Journal Overview:
Publishes research at the interface of molecular (including genomics) and evolutionary biology
Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic
Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research
Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.