{"title":"在DNA序列中寻找多个重复的优化过滤器","authors":"Maria Federico, P. Peterlongo, N. Pisanti","doi":"10.1109/AICCSA.2010.5587026","DOIUrl":null,"url":null,"abstract":"This paper presents new optimizations designed to improve an algorithm at the state-of-the-art for filtering sequences as a preprocessing step to the task of finding multiple repeats allowing a given pairwise edit distance between pairs of occurrences. The target application is to find possibly long repeats having two or more occurrences, such that each couple of occurrences may show substitutions, insertions or deletions in up to 10 to 15 % of their size. Assimilated to multiple alignment, exact detection of multiple repeats is an NP-hard problem. For increasing computation speed while avoiding the use of heuristics, one may use filters that quickly remove large parts of input that do not contain searched repeats. We describe at theoretical level some optimizations that can be applied to the tool that is currently the state-of-the-art for this filtering task. Finally, we exhibit some experiments in which the optimized tool outperforms its original version.","PeriodicalId":352946,"journal":{"name":"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An optimized filter for finding multiple repeats in DNA sequences\",\"authors\":\"Maria Federico, P. Peterlongo, N. Pisanti\",\"doi\":\"10.1109/AICCSA.2010.5587026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents new optimizations designed to improve an algorithm at the state-of-the-art for filtering sequences as a preprocessing step to the task of finding multiple repeats allowing a given pairwise edit distance between pairs of occurrences. The target application is to find possibly long repeats having two or more occurrences, such that each couple of occurrences may show substitutions, insertions or deletions in up to 10 to 15 % of their size. Assimilated to multiple alignment, exact detection of multiple repeats is an NP-hard problem. For increasing computation speed while avoiding the use of heuristics, one may use filters that quickly remove large parts of input that do not contain searched repeats. We describe at theoretical level some optimizations that can be applied to the tool that is currently the state-of-the-art for this filtering task. Finally, we exhibit some experiments in which the optimized tool outperforms its original version.\",\"PeriodicalId\":352946,\"journal\":{\"name\":\"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICCSA.2010.5587026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2010.5587026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An optimized filter for finding multiple repeats in DNA sequences
This paper presents new optimizations designed to improve an algorithm at the state-of-the-art for filtering sequences as a preprocessing step to the task of finding multiple repeats allowing a given pairwise edit distance between pairs of occurrences. The target application is to find possibly long repeats having two or more occurrences, such that each couple of occurrences may show substitutions, insertions or deletions in up to 10 to 15 % of their size. Assimilated to multiple alignment, exact detection of multiple repeats is an NP-hard problem. For increasing computation speed while avoiding the use of heuristics, one may use filters that quickly remove large parts of input that do not contain searched repeats. We describe at theoretical level some optimizations that can be applied to the tool that is currently the state-of-the-art for this filtering task. Finally, we exhibit some experiments in which the optimized tool outperforms its original version.