An improved approximation algorithm for the reversal and transposition distance considering gene order and intergenic sizes.

IF 1.7 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Algorithms for Molecular Biology Pub Date : 2021-12-29 DOI:10.1186/s13015-021-00203-7

Klairton L Brito, Andre R Oliveira, Alexsandro O Alexandrino, Ulisses Dias, Zanoni Dias

{"title":"An improved approximation algorithm for the reversal and transposition distance considering gene order and intergenic sizes.","authors":"Klairton L Brito, Andre R Oliveira, Alexsandro O Alexandrino, Ulisses Dias, Zanoni Dias","doi":"10.1186/s13015-021-00203-7","DOIUrl":null,"url":null,"abstract":"Background: In the comparative genomics field, one of the goals is to estimate a sequence of genetic changes capable of transforming a genome into another. Genome rearrangement events are mutations that can alter the genetic content or the arrangement of elements from the genome. Reversal and transposition are two of the most studied genome rearrangement events. A reversal inverts a segment of a genome while a transposition swaps two consecutive segments. Initial studies in the area considered only the order of the genes. Recent works have incorporated other genetic information in the model. In particular, the information regarding the size of intergenic regions, which are structures between each pair of genes and in the extremities of a linear genome.Results and conclusions: In this work, we investigate the SORTING BY INTERGENIC REVERSALS AND TRANSPOSITIONS problem on genomes sharing the same set of genes, considering the cases where the orientation of genes is known and unknown. Besides, we explored a variant of the problem, which generalizes the transposition event. As a result, we present an approximation algorithm that guarantees an approximation factor of 4 for both cases considering the reversal and transposition (classic definition) events, an improvement from the 4.5-approximation previously known for the scenario where the orientation of the genes is unknown. We also present a 3-approximation algorithm by incorporating the generalized transposition event, and we propose a greedy strategy to improve the performance of the algorithms. We performed practical tests adopting simulated data which indicated that the algorithms, in both cases, tend to perform better when compared with the best-known algorithms for the problem. Lastly, we conducted experiments using real genomes to demonstrate the applicability of the algorithms.","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"16 1","pages":"24"},"PeriodicalIF":1.7000,"publicationDate":"2021-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8717661/pdf/","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms for Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13015-021-00203-7","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 4

Abstract

Background: In the comparative genomics field, one of the goals is to estimate a sequence of genetic changes capable of transforming a genome into another. Genome rearrangement events are mutations that can alter the genetic content or the arrangement of elements from the genome. Reversal and transposition are two of the most studied genome rearrangement events. A reversal inverts a segment of a genome while a transposition swaps two consecutive segments. Initial studies in the area considered only the order of the genes. Recent works have incorporated other genetic information in the model. In particular, the information regarding the size of intergenic regions, which are structures between each pair of genes and in the extremities of a linear genome.

Results and conclusions: In this work, we investigate the SORTING BY INTERGENIC REVERSALS AND TRANSPOSITIONS problem on genomes sharing the same set of genes, considering the cases where the orientation of genes is known and unknown. Besides, we explored a variant of the problem, which generalizes the transposition event. As a result, we present an approximation algorithm that guarantees an approximation factor of 4 for both cases considering the reversal and transposition (classic definition) events, an improvement from the 4.5-approximation previously known for the scenario where the orientation of the genes is unknown. We also present a 3-approximation algorithm by incorporating the generalized transposition event, and we propose a greedy strategy to improve the performance of the algorithms. We performed practical tests adopting simulated data which indicated that the algorithms, in both cases, tend to perform better when compared with the best-known algorithms for the problem. Lastly, we conducted experiments using real genomes to demonstrate the applicability of the algorithms.

Abstract Image

查看原文本刊更多论文

一种考虑基因顺序和基因间大小的反转和转位距离的改进近似算法。

背景:在比较基因组学领域，目标之一是估计能够将基因组转化为另一个基因组的遗传变化序列。基因组重排事件是可以改变基因内容或基因组元素排列的突变。反转和转位是研究最多的两个基因组重排事件。反转反转基因组的一个片段，而转位互换两个连续的片段。该领域的初步研究只考虑了基因的顺序。最近的研究在模型中加入了其他遗传信息。特别是关于基因间区域大小的信息，基因间区域是每对基因之间和线性基因组末端的结构。结果和结论:在这项工作中，我们研究了共享同一组基因的基因组的基因间反转和转位排序问题，考虑了基因取向已知和未知的情况。此外，我们还探讨了该问题的一个变体，它推广了换位事件。因此，我们提出了一种近似算法，在考虑反转和转置(经典定义)事件的情况下，保证近似因子为4，这是先前已知的基因取向未知情况下的4.5近似的改进。我们还提出了一种结合广义转置事件的3-逼近算法，并提出了一种贪婪策略来提高算法的性能。我们采用模拟数据进行了实际测试，结果表明，在这两种情况下，与解决该问题的最知名算法相比，算法往往表现得更好。最后，我们使用真实基因组进行了实验，以证明算法的适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Algorithms for Molecular Biology 生物-生化研究方法

CiteScore

2.40

自引率

10.00%

发文量

审稿时长

>12 weeks

期刊介绍： Algorithms for Molecular Biology publishes articles on novel algorithms for biological sequence and structure analysis, phylogeny reconstruction, and combinatorial algorithms and machine learning. Areas of interest include but are not limited to: algorithms for RNA and protein structure analysis, gene prediction and genome analysis, comparative sequence analysis and alignment, phylogeny, gene expression, machine learning, and combinatorial algorithms. Where appropriate, manuscripts should describe applications to real-world data. However, pure algorithm papers are also welcome if future applications to biological data are to be expected, or if they address complexity or approximation issues of novel computational problems in molecular biology. Articles about novel software tools will be considered for publication if they contain some algorithmically interesting aspects.