An Efficient Algorithm for String Motif Discovery

Francis Y. L. Chin, Henry C. M. Leung
{"title":"An Efficient Algorithm for String Motif Discovery","authors":"Francis Y. L. Chin, Henry C. M. Leung","doi":"10.1142/9781860947292_0011","DOIUrl":null,"url":null,"abstract":"Finding common patterns, motifs, in a set of DNA sequences is an important problem in bioinformatics. One common representation of motifs is a string with symbols A, C, G, T and N where N stands for the wildcard symbol. In this paper, we introduce a more general motif discovery problem without any weaknesses of the Planted (l,d)-Motif Problem and also a set of control sequences as an additional input. The existing algorithms using brute force approach for solving similar problem take O(n(t+f)l5) times where t and f are the number of input sequences and control sequences respectively, n is the length of each sequence and l is the length of the motif. We propose an efficient algorithm, called VAS, which has an expected running time O(nfl(nt)(4+1/4)) using O((nt)(4+1/4)) space for any integer k. In particular when k = 3, the time and space complexities are O(nlf (nt)(1.0625)) and O((nt)(1.0625)) respectively. This algorithm makes use of voting and graph representation for better time and space complexities. This technique can also be used to improve the performances of some existing algorithms.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"6 1","pages":"79-88"},"PeriodicalIF":0.0000,"publicationDate":"2005-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Asia-Pacific bioinformatics conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781860947292_0011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

Finding common patterns, motifs, in a set of DNA sequences is an important problem in bioinformatics. One common representation of motifs is a string with symbols A, C, G, T and N where N stands for the wildcard symbol. In this paper, we introduce a more general motif discovery problem without any weaknesses of the Planted (l,d)-Motif Problem and also a set of control sequences as an additional input. The existing algorithms using brute force approach for solving similar problem take O(n(t+f)l5) times where t and f are the number of input sequences and control sequences respectively, n is the length of each sequence and l is the length of the motif. We propose an efficient algorithm, called VAS, which has an expected running time O(nfl(nt)(4+1/4)) using O((nt)(4+1/4)) space for any integer k. In particular when k = 3, the time and space complexities are O(nlf (nt)(1.0625)) and O((nt)(1.0625)) respectively. This algorithm makes use of voting and graph representation for better time and space complexities. This technique can also be used to improve the performances of some existing algorithms.
一种高效的字符串基序发现算法
在一组DNA序列中寻找共同的模式,基序是生物信息学中的一个重要问题。图案的一种常见表示是带有符号a、C、G、T和N的字符串,其中N代表通配符符号。在本文中,我们引入了一个更一般的基序发现问题,该问题没有planded -Motif问题的任何弱点,并且还引入了一组控制序列作为附加输入。现有的使用蛮力方法求解类似问题的算法需要O(n(t+f) 15)次,其中t和f分别是输入序列和控制序列的个数,n是每个序列的长度,l是motif的长度。我们提出了一种高效的算法,称为VAS,它对任意整数k使用O((nt)(4+1/4))空间的期望运行时间为O(nfl(nt)(4+1/4))。特别是当k = 3时,时间和空间复杂度分别为O(nlf (nt)(1.0625))和O((nt)(1.0625))。该算法利用投票和图形表示来提高时间和空间复杂度。这种技术也可以用来提高一些现有算法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信