An optimized filter for finding multiple repeats in DNA sequences

Maria Federico, P. Peterlongo, N. Pisanti
{"title":"An optimized filter for finding multiple repeats in DNA sequences","authors":"Maria Federico, P. Peterlongo, N. Pisanti","doi":"10.1109/AICCSA.2010.5587026","DOIUrl":null,"url":null,"abstract":"This paper presents new optimizations designed to improve an algorithm at the state-of-the-art for filtering sequences as a preprocessing step to the task of finding multiple repeats allowing a given pairwise edit distance between pairs of occurrences. The target application is to find possibly long repeats having two or more occurrences, such that each couple of occurrences may show substitutions, insertions or deletions in up to 10 to 15 % of their size. Assimilated to multiple alignment, exact detection of multiple repeats is an NP-hard problem. For increasing computation speed while avoiding the use of heuristics, one may use filters that quickly remove large parts of input that do not contain searched repeats. We describe at theoretical level some optimizations that can be applied to the tool that is currently the state-of-the-art for this filtering task. Finally, we exhibit some experiments in which the optimized tool outperforms its original version.","PeriodicalId":352946,"journal":{"name":"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2010.5587026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This paper presents new optimizations designed to improve an algorithm at the state-of-the-art for filtering sequences as a preprocessing step to the task of finding multiple repeats allowing a given pairwise edit distance between pairs of occurrences. The target application is to find possibly long repeats having two or more occurrences, such that each couple of occurrences may show substitutions, insertions or deletions in up to 10 to 15 % of their size. Assimilated to multiple alignment, exact detection of multiple repeats is an NP-hard problem. For increasing computation speed while avoiding the use of heuristics, one may use filters that quickly remove large parts of input that do not contain searched repeats. We describe at theoretical level some optimizations that can be applied to the tool that is currently the state-of-the-art for this filtering task. Finally, we exhibit some experiments in which the optimized tool outperforms its original version.
在DNA序列中寻找多个重复的优化过滤器
本文提出了新的优化设计,以提高算法在国家的最先进的过滤序列作为一个预处理步骤,以寻找多个重复的任务,允许给定成对的编辑距离对出现。目标应用程序是查找具有两个或多个出现的可能较长的重复,这样每个出现的一对可能显示替换、插入或删除,其大小可达其大小的10%到15%。与多重比对类似,多重重复序列的精确检测是一个np难题。为了在避免使用启发式的同时提高计算速度,可以使用过滤器来快速删除不包含搜索重复的大部分输入。我们在理论层面描述了一些可以应用于该工具的优化,这些工具目前是此过滤任务的最新技术。最后,我们展示了一些实验,其中优化后的工具优于其原始版本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信