{"title":"New algorithms for structure informed genome rearrangement.","authors":"Eden Ozeri, Meirav Zehavi, Michal Ziv-Ukelson","doi":"10.1186/s13015-023-00239-x","DOIUrl":null,"url":null,"abstract":"<p><p>We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, [Formula: see text] ([Formula: see text]), we define the basic structure-informed rearrangement measure. Here, we assume that the gene order members of the gene cluster from which the PQ-tree is constructed are permutations. The PQ-tree representing the gene cluster is ordered such that the series of gene IDs spelled by its leaves is equivalent to that of the reference gene order. Then, a structure-informed genome rearrangement distance is computed between the ordered PQ-tree and the target gene order. The second problem, [Formula: see text] ([Formula: see text]), generalizes [Formula: see text], where the gene order members are not necessarily permutations and the structure informed rearrangement measure is extended to also consider up to [Formula: see text] and [Formula: see text] gene insertion and deletion operations, respectively, when modelling the PQ-tree informed divergence process from the reference gene order to the target gene order. The first algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string and the number of leaves in the tree, and [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively. If one of the penalties of [Formula: see text] is 0, then the algorithm runs in [Formula: see text] time and [Formula: see text] space. The second algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string, m is the number of leaves in the tree, [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively, and allowing up to [Formula: see text] deletions from the tree and up to [Formula: see text] deletions from the string. The third algorithm is intended to reduce the space complexity of the second algorithm. It solves a variant of the problem (where one of the penalties of [Formula: see text] is 0) in [Formula: see text] time and [Formula: see text] space. The algorithm is implemented as a software tool, denoted MEM-Rearrange, and applied to the comparative and evolutionary analysis of 59 chromosomal gene clusters extracted from a dataset of 1487 prokaryotic genomes.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"18 1","pages":"17"},"PeriodicalIF":1.5000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691145/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-023-00239-x","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, [Formula: see text] ([Formula: see text]), we define the basic structure-informed rearrangement measure. Here, we assume that the gene order members of the gene cluster from which the PQ-tree is constructed are permutations. The PQ-tree representing the gene cluster is ordered such that the series of gene IDs spelled by its leaves is equivalent to that of the reference gene order. Then, a structure-informed genome rearrangement distance is computed between the ordered PQ-tree and the target gene order. The second problem, [Formula: see text] ([Formula: see text]), generalizes [Formula: see text], where the gene order members are not necessarily permutations and the structure informed rearrangement measure is extended to also consider up to [Formula: see text] and [Formula: see text] gene insertion and deletion operations, respectively, when modelling the PQ-tree informed divergence process from the reference gene order to the target gene order. The first algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string and the number of leaves in the tree, and [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively. If one of the penalties of [Formula: see text] is 0, then the algorithm runs in [Formula: see text] time and [Formula: see text] space. The second algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string, m is the number of leaves in the tree, [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively, and allowing up to [Formula: see text] deletions from the tree and up to [Formula: see text] deletions from the string. The third algorithm is intended to reduce the space complexity of the second algorithm. It solves a variant of the problem (where one of the penalties of [Formula: see text] is 0) in [Formula: see text] time and [Formula: see text] space. The algorithm is implemented as a software tool, denoted MEM-Rearrange, and applied to the comparative and evolutionary analysis of 59 chromosomal gene clusters extracted from a dataset of 1487 prokaryotic genomes.
我们定义了完美基因组重排领域的两个新的计算问题,并提出了三种算法来解决它们。该问题建模的重排场景考虑了反转和块交换操作,并使用pq树来指导允许的操作并计算其权重。在第一个问题[公式:见文]([公式:见文])中,我们定义了基本的基于结构的重排度量。在这里,我们假设构建pq树的基因簇的基因顺序成员是排列。表示基因簇的pq树是有序的,其叶子拼写的一系列基因id与参考基因序列相等。然后,计算有序pq树和目标基因序列之间的结构信息基因组重排距离。第二个问题,[公式:见文]([公式:见文]),推广了[公式:见文],其中基因序列成员不一定是排列,并且结构通知重排措施被扩展到分别考虑[公式:见文]和[公式:见文]基因插入和删除操作,当建模pq树通知从参考基因序列到目标基因序列的发散过程时。第一种算法在[公式:见文]时间和[公式:见文]空间中求解[公式:见文],其中[公式:见文]为节点的最大子节点数,n为字符串长度和树中叶子的个数,[公式:见文]和[公式:见文]分别为树中p节点和q节点的个数。如果[Formula: see text]的其中一个惩罚为0,则算法在[Formula: see text]时间和[Formula: see text]空间中运行。第二个算法解决[公式:看到文本][公式:看到文本][公式:看到文本]空间,(公式:看到文本)是儿童的最大数量的节点,n是字符串的长度,m是树中的叶子,[公式:看到文本]和[公式:看到文本]P-nodes和Q-nodes树的数量,分别和允许[公式:看到文本]删除从树上,[公式:看到文本]删除字符串。第三种算法旨在降低第二种算法的空间复杂度。它在[公式:见文本]时间和[公式:见文本]空间中解决了问题的一个变体(其中[公式:见文本]的惩罚之一是0)。该算法作为一个软件工具实现,命名为memm - rearrange,并应用于从1487个原核生物基因组数据集中提取的59个染色体基因簇的比较和进化分析。
期刊介绍:
Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning.
Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms.
Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.