Reconstructing rearrangement phylogenies of natural genomes.

IF 1.7 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Algorithms for Molecular Biology Pub Date : 2025-06-07 DOI:10.1186/s13015-025-00279-5

Leonard Bohnenkämper, Jens Stoye, Daniel Doerr

{"title":"Reconstructing rearrangement phylogenies of natural genomes.","authors":"Leonard Bohnenkämper, Jens Stoye, Daniel Doerr","doi":"10.1186/s13015-025-00279-5","DOIUrl":null,"url":null,"abstract":"Background: We study the classical problem of inferring ancestral genomes from a set of extant genomes under a given phylogeny, known as the Small Parsimony Problem (SPP). Genomes are represented as sequences of oriented markers, organized in one or more linear or circular chromosomes. Any marker may appear in several copies, without restriction on orientation or genomic location, known as the natural genomes model. Evolutionary events along the branches of the phylogeny encompass large scale rearrangements, including segmental inversions, translocations, gain and loss (DCJ-indel model). Even under simpler rearrangement models, such as the classical breakpoint model without duplicates, the SPP is computationally intractable. Nevertheless, the SPP for natural genomes under the DCJ-indel model has been studied recently, with limited success.Methods: Building on prior work, we present a highly optimized ILP that is able to solve the SPP for sufficiently small phylogenies and gene families. A notable improvement w.r.t. the previous result is an optimized way of handling both circular and linear chromosomes. This is especially relevant to the SPP, since the chromosomal structure of ancestral genomes is unknown and the solution space for this chromosomal structure is typically large.Results: We benchmark our method on simulated and real data. On simulated phylogenies we observe a considerable performance improvement on problems that include linear chromosomes. And even when the ground truth contains only one circular chromosome per genome, our method outperforms its predecessor due to its optimized handling of the solution space. The practical advantage becomes also visible in an analysis of seven Anopheles taxa.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"10"},"PeriodicalIF":1.7000,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12144824/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-025-00279-5","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: We study the classical problem of inferring ancestral genomes from a set of extant genomes under a given phylogeny, known as the Small Parsimony Problem (SPP). Genomes are represented as sequences of oriented markers, organized in one or more linear or circular chromosomes. Any marker may appear in several copies, without restriction on orientation or genomic location, known as the natural genomes model. Evolutionary events along the branches of the phylogeny encompass large scale rearrangements, including segmental inversions, translocations, gain and loss (DCJ-indel model). Even under simpler rearrangement models, such as the classical breakpoint model without duplicates, the SPP is computationally intractable. Nevertheless, the SPP for natural genomes under the DCJ-indel model has been studied recently, with limited success.

Methods: Building on prior work, we present a highly optimized ILP that is able to solve the SPP for sufficiently small phylogenies and gene families. A notable improvement w.r.t. the previous result is an optimized way of handling both circular and linear chromosomes. This is especially relevant to the SPP, since the chromosomal structure of ancestral genomes is unknown and the solution space for this chromosomal structure is typically large.

Results: We benchmark our method on simulated and real data. On simulated phylogenies we observe a considerable performance improvement on problems that include linear chromosomes. And even when the ground truth contains only one circular chromosome per genome, our method outperforms its predecessor due to its optimized handling of the solution space. The practical advantage becomes also visible in an analysis of seven Anopheles taxa.

Abstract Image

查看原文本刊更多论文

重建自然基因组重排系统发育。

背景：我们研究了从给定系统发育下的一组现存基因组推断祖先基因组的经典问题，称为小简约问题（SPP）。基因组表示为定向标记序列，组织在一个或多个线性或圆形染色体中。任何标记都可以出现在多个副本中，不受方向或基因组位置的限制，称为自然基因组模型。沿着系统发育分支的进化事件包括大规模的重排，包括片段倒置、易位、获得和损失（DCJ-indel模型）。即使在更简单的重排模型下，例如没有重复的经典断点模型，SPP在计算上也是难以处理的。尽管如此，在DCJ-indel模型下对自然基因组的SPP进行了研究，但收效甚微。方法：在先前工作的基础上，我们提出了一个高度优化的ILP，能够解决足够小的系统发育和基因家族的SPP问题。与之前的结果相比，一个显著的改进是处理圆形和线形染色体的优化方法。这与SPP特别相关，因为祖先基因组的染色体结构是未知的，而且这种染色体结构的解空间通常很大。结果：我们在模拟和真实数据上对我们的方法进行了基准测试。在模拟系统发育中，我们观察到在包含线性染色体的问题上有相当大的性能改进。即使当每个基因组只包含一个圆形染色体时，我们的方法也优于其前身，因为它优化了对解空间的处理。在对7个按蚊分类群的分析中，实际优势也变得明显。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Algorithms for Molecular Biology 生物-生化研究方法

CiteScore

2.40

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.