Orthology and near-cographs in the context of phylogenetic networks.

IF 1.7 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Algorithms for Molecular Biology Pub Date : 2025-10-02 DOI:10.1186/s13015-025-00285-7

Anna Lindeberg, Guillaume E Scholz, Nicolas Wieseke, Marc Hellmuth

{"title":"Orthology and near-cographs in the context of phylogenetic networks.","authors":"Anna Lindeberg, Guillaume E Scholz, Nicolas Wieseke, Marc Hellmuth","doi":"10.1186/s13015-025-00285-7","DOIUrl":null,"url":null,"abstract":"<p><p>Orthologous genes, which arise through speciation, play a key role in comparative genomics and functional inference. In particular, graph-based methods allow for the inference of orthology estimates without prior knowledge of the underlying gene or species trees. This results in orthology graphs, where each vertex represents a gene, and an edge exists between two vertices if the corresponding genes are estimated to be orthologs. Orthology graphs inferred under a tree-like evolutionary model must be cographs. However, real-world data often deviate from this property, either due to noise in the data, errors in inference methods or, simply, because evolution follows a network-like rather than a tree-like process. The latter, in particular, raises the question of whether and how orthology graphs can be derived from or, equivalently, are explained by phylogenetic networks. In this work, we study the constraints imposed on orthology graphs when the underlying evolutionary history follows a phylogenetic network instead of a tree. We show that any orthology graph can be represented by a sufficiently complex level-k network. However, such networks lack biologically meaningful constraints. In contrast, level-1 networks provide a simpler explanation, and we establish characterizations for level-1 explainable orthology graphs, i.e., those derived from level-1 evolutionary histories. To this end, we employ modular decomposition, a classical technique for studying graph structures. Specifically, an arbitrary graph is level-1 explainable if and only if each primitive subgraph is a near-cograph (a graph in which the removal of a single vertex results in a cograph). Additionally, we present a linear-time algorithm to recognize level-1 explainable orthology graphs and to construct a level-1 network that explains them, if such a network exists. Finally, we demonstrate the close relationship of level-1 explainable orthology graphs to the substitution operation, weakly chordal and perfect graphs, as well as graphs with twin-width at most 2.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"20 1","pages":"19"},"PeriodicalIF":1.7000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490074/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-025-00285-7","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Orthologous genes, which arise through speciation, play a key role in comparative genomics and functional inference. In particular, graph-based methods allow for the inference of orthology estimates without prior knowledge of the underlying gene or species trees. This results in orthology graphs, where each vertex represents a gene, and an edge exists between two vertices if the corresponding genes are estimated to be orthologs. Orthology graphs inferred under a tree-like evolutionary model must be cographs. However, real-world data often deviate from this property, either due to noise in the data, errors in inference methods or, simply, because evolution follows a network-like rather than a tree-like process. The latter, in particular, raises the question of whether and how orthology graphs can be derived from or, equivalently, are explained by phylogenetic networks. In this work, we study the constraints imposed on orthology graphs when the underlying evolutionary history follows a phylogenetic network instead of a tree. We show that any orthology graph can be represented by a sufficiently complex level-k network. However, such networks lack biologically meaningful constraints. In contrast, level-1 networks provide a simpler explanation, and we establish characterizations for level-1 explainable orthology graphs, i.e., those derived from level-1 evolutionary histories. To this end, we employ modular decomposition, a classical technique for studying graph structures. Specifically, an arbitrary graph is level-1 explainable if and only if each primitive subgraph is a near-cograph (a graph in which the removal of a single vertex results in a cograph). Additionally, we present a linear-time algorithm to recognize level-1 explainable orthology graphs and to construct a level-1 network that explains them, if such a network exists. Finally, we demonstrate the close relationship of level-1 explainable orthology graphs to the substitution operation, weakly chordal and perfect graphs, as well as graphs with twin-width at most 2.

查看原文本刊更多论文

系统发育网络中的正形学和近图。

通过物种形成产生的同源基因在比较基因组学和功能推断中起着关键作用。特别是，基于图的方法允许在没有潜在基因或物种树的先验知识的情况下推断同源估计。这就产生了正交图，其中每个顶点代表一个基因，如果估计对应的基因是正交的，则在两个顶点之间存在一条边。在树状进化模型下推断的正交图必须是图。然而，现实世界的数据经常偏离这一属性，或者是由于数据中的噪声，推理方法中的错误，或者仅仅是因为进化遵循网络而不是树状过程。后者特别提出了一个问题，即是否以及如何从系统发育网络中衍生出正畸图，或者同样地，由系统发育网络来解释。在这项工作中，我们研究了当潜在的进化历史遵循系统发育网络而不是树时对正形图施加的约束。我们证明了任何正交图都可以用一个足够复杂的k级网络来表示。然而，这样的网络缺乏生物学上有意义的限制。相比之下，一级网络提供了一个更简单的解释，我们建立了一级可解释的正交图的特征，即那些来自一级进化史的图。为此，我们采用了模分解，这是研究图结构的一种经典技术。具体地说，任意图是一级可解释的，当且仅当每个原始子图是近图（其中单个顶点的移除导致一个图）。此外，我们提出了一个线性时间算法来识别一级可解释的正交图，并构建一个解释它们的一级网络，如果这样的网络存在的话。最后，我们证明了一级可解释正交图与代换操作、弱弦图和完美图以及双宽最多为2的图的密切关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Algorithms for Molecular Biology 生物-生化研究方法

CiteScore

2.40

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.