二聚体引导和错误限制间隔基序发现的遗传算法

2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) Pub Date : 2013-04-16 DOI:10.1109/CIBCB.2013.6595409

Tak-Ming Chan, Leung-Yau Lo, M. Wong, Yong Liang, K. Leung

{"title":"二聚体引导和错误限制间隔基序发现的遗传算法","authors":"Tak-Ming Chan, Leung-Yau Lo, M. Wong, Yong Liang, K. Leung","doi":"10.1109/CIBCB.2013.6595409","DOIUrl":null,"url":null,"abstract":"DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.","PeriodicalId":350407,"journal":{"name":"2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Genetic algorithm for dimer-led and error-restricted spaced motif discovery\",\"authors\":\"Tak-Ming Chan, Leung-Yau Lo, M. Wong, Yong Liang, K. Leung\",\"doi\":\"10.1109/CIBCB.2013.6595409\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.\",\"PeriodicalId\":350407,\"journal\":{\"name\":\"2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CIBCB.2013.6595409\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2013.6595409","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

DNA基序发现是破解基因调控中蛋白质-DNA结合的一个重要问题。为了发现由通配符分隔的具有多个保守模式的通用间隔基序，提出了基于遗传算法(GA)的GASMEN，并证明其优于相关方法。然而，对于任意数量的间隔器的过于泛型的建模增加了实践中的优化难度。在蛋白质- dna结合案例研究中，复杂的间隔基序是罕见的，而具有单间隔基序的二聚体是更常见的间隔基序。此外，保守模式中的错误(不匹配)不是任意分布的，因为某些高度保守的核苷酸对于维持结合是必不可少的。为了在实际应用中更好地优化，我们开发了一种新的方法，即二聚体主导和误差限制的间隔基元遗传算法(GADESM)。在种群初始化中，特别注意使用二聚体引导的初始化。在实际数据集上的结果表明，GADESM中二聚体引导的初始化比GASMEN具有更好的适应度，且具有统计学意义。GADESM在综合仿真数据和实际ChIP-seq案例研究中都表现出比GASMEN更好的性能，并且具有额外的错误限制基序发生检索功能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Genetic algorithm for dimer-led and error-restricted spaced motif discovery

DNA motif discovery is an important problem for deciphering protein-DNA bindings in gene regulation. To discover generic spaced motifs which have multiple conserved patterns separated by wild-cards called spacers, the genetic algorithm (GA) based GASMEN has been proposed and shown to outperform related methods. However, the over-generic modeling of any number of spacers increases the optimization difficulty in practice. In protein-DNA binding case studies, complicated spaced motifs are rare while dimers with single spacers are more common spaced motifs. Moreover, errors (mismatches) in a conserved pattern are not arbitrarily distributed as certain highly conserved nucleotides are essential to maintain bindings. Motivated by better optimization in real applications, we have developed a new method, which is GA for Dimer-led and Error-restricted Spaced Motifs (GADESM). Common spaced motifs are paid special attention to using dimer-led initialization in the population initialization. The results on real datasets show that the dimer-led initialization in GADESM achieves better fitness than GASMEN with statistical significance. With additional error-restricted motif occurrence retrieval, GADESM has shown better performance than GASMEN on both comprehensive simulation data and a real ChIP-seq case study.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

自引率

0.00%

发文量