Genetic algorithm for dimer-led and error-restricted spaced motif discovery

Tak-Ming Chan, Leung-Yau Lo, M. Wong, Yong Liang, K. Leung
{"title":"Genetic algorithm for dimer-led and error-restricted spaced motif discovery","authors":"Tak-Ming Chan, Leung-Yau Lo, M. Wong, Yong Liang, K. Leung","doi":"10.1109/CIBCB.2013.6595409","DOIUrl":null,"url":null,"abstract":"DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.","PeriodicalId":350407,"journal":{"name":"2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2013.6595409","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.
二聚体引导和错误限制间隔基序发现的遗传算法
DNA基序发现是破解基因调控中蛋白质-DNA结合的一个重要问题。为了发现由通配符分隔的具有多个保守模式的通用间隔基序,提出了基于遗传算法(GA)的GASMEN,并证明其优于相关方法。然而,对于任意数量的间隔器的过于泛型的建模增加了实践中的优化难度。在蛋白质- dna结合案例研究中,复杂的间隔基序是罕见的,而具有单间隔基序的二聚体是更常见的间隔基序。此外,保守模式中的错误(不匹配)不是任意分布的,因为某些高度保守的核苷酸对于维持结合是必不可少的。为了在实际应用中更好地优化,我们开发了一种新的方法,即二聚体主导和误差限制的间隔基元遗传算法(GADESM)。在种群初始化中,特别注意使用二聚体引导的初始化。在实际数据集上的结果表明,GADESM中二聚体引导的初始化比GASMEN具有更好的适应度,且具有统计学意义。GADESM在综合仿真数据和实际ChIP-seq案例研究中都表现出比GASMEN更好的性能,并且具有额外的错误限制基序发生检索功能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信