Optimal Algorithm for Finding DNA Motifs with Nucleotide Adjacent Dependency

Francis Y. L. Chin, Henry C. M. Leung, M. Siu, S. Yiu
{"title":"Optimal Algorithm for Finding DNA Motifs with Nucleotide Adjacent Dependency","authors":"Francis Y. L. Chin, Henry C. M. Leung, M. Siu, S. Yiu","doi":"10.1142/9781848161092_0035","DOIUrl":null,"url":null,"abstract":"Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and h u n g introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NP-hard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPS-Finder) for finding optimal DPS motifs. Experimental results show that DPS-Finder can discover a length-10 motif from 22 length500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation.","PeriodicalId":74513,"journal":{"name":"Proceedings of the ... Asia-Pacific bioinformatics conference","volume":"24 1","pages":"343-352"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... Asia-Pacific bioinformatics conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/9781848161092_0035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and h u n g introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NP-hard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPS-Finder) for finding optimal DPS motifs. Experimental results show that DPS-Finder can discover a length-10 motif from 22 length500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation.
寻找核苷酸相邻依赖DNA基序的最优算法
寻找基序和相应的结合位点是研究基因表达过程的一个关键和具有挑战性的问题。字符串表示和矩阵表示是两种常用的表示图案的模型。然而,这两种表述都有一个重要的弱点,即假设一个核苷酸在结合位点的发生与其他核苷酸无关。存在更复杂的表示,如HMM或正则表达式,可以捕获核苷酸依赖性。不幸的是,这些模型并不实用(参数太多,需要许多已知的结合位点)。最近,Chin和hu引入了SPSP表示,克服了这些复杂模型的局限性。然而,在SPSP表示中发现新的基序仍然是一个np难题。在本文中,基于我们对实际结合位点的观察,我们提出了一个更简单的模型,即依赖模式集(Dependency Pattern Sets, DPS)表示,它比SPSP模型更简单,但仍然可以捕获核苷酸依赖性。我们开发了一个分支定界算法(DPS- finder)来寻找最优的DPS基序。实验结果表明,DPS- finder可以在几分钟内从22个长度为500的DNA序列中发现长度为10的基序,DPS表示与SPSP表示具有相似的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信