序列数据集的约束频繁基序检测

Mr. E. Ramanujam, Dr. S. Padmavathi
{"title":"序列数据集的约束频繁基序检测","authors":"Mr. E. Ramanujam, Dr. S. Padmavathi","doi":"10.1109/ICOAC.2012.6416844","DOIUrl":null,"url":null,"abstract":"The Subsequence Motif mining problem has large class of applications in the field of Bioinformatics, such as protein-protein interaction, protein Motif mining, DNA classification, web log analysis and the like. Existing algorithms detects contiguous exact and approximate patterns by restricting the user in pattern length. Though many algorithms have been solved for the related problem with poor scalability, time inefficiency, some algorithms have extracted only non-contiguous exact patterns without noise in adapting to other applications. In this paper, Constraint Frequent Motif Detection (CFMD) an algorithm is used in extracting both contiguous, non-contiguous patterns of short or long sequences of any length in biological database. CFMD combines data mining techniques such as TRIE like Frequent Pattern (FP-Tree) in constructing the patterns in such a way that most commonly occurring patterns from root to leaf node, Constraints to restrict the growth of FP-Tree and to reduce the search space of the FP-Tree. The efficiency of the proposed CFMD is fast, scalable to extract patterns from both contiguous and non-contiguous sequences. The performance of the proposed approach is proved using both real and synthetic datasets.","PeriodicalId":286985,"journal":{"name":"2012 Fourth International Conference on Advanced Computing (ICoAC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Constraint Frequent Motif Detection in sequence datasets\",\"authors\":\"Mr. E. Ramanujam, Dr. S. Padmavathi\",\"doi\":\"10.1109/ICOAC.2012.6416844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Subsequence Motif mining problem has large class of applications in the field of Bioinformatics, such as protein-protein interaction, protein Motif mining, DNA classification, web log analysis and the like. Existing algorithms detects contiguous exact and approximate patterns by restricting the user in pattern length. Though many algorithms have been solved for the related problem with poor scalability, time inefficiency, some algorithms have extracted only non-contiguous exact patterns without noise in adapting to other applications. In this paper, Constraint Frequent Motif Detection (CFMD) an algorithm is used in extracting both contiguous, non-contiguous patterns of short or long sequences of any length in biological database. CFMD combines data mining techniques such as TRIE like Frequent Pattern (FP-Tree) in constructing the patterns in such a way that most commonly occurring patterns from root to leaf node, Constraints to restrict the growth of FP-Tree and to reduce the search space of the FP-Tree. The efficiency of the proposed CFMD is fast, scalable to extract patterns from both contiguous and non-contiguous sequences. The performance of the proposed approach is proved using both real and synthetic datasets.\",\"PeriodicalId\":286985,\"journal\":{\"name\":\"2012 Fourth International Conference on Advanced Computing (ICoAC)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Fourth International Conference on Advanced Computing (ICoAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOAC.2012.6416844\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Fourth International Conference on Advanced Computing (ICoAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOAC.2012.6416844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

子序列Motif挖掘问题在生物信息学领域有着广泛的应用,如蛋白质相互作用、蛋白质Motif挖掘、DNA分类、网络日志分析等。现有算法通过限制用户的模式长度来检测连续的精确模式和近似模式。虽然许多算法解决了可扩展性差、时间效率低的问题,但在适应其他应用时,有些算法只提取了非连续的精确模式,没有噪声。本文将约束频繁基序检测(CFMD)算法用于提取生物数据库中任意长度的短序列或长序列的连续或非连续模式。CFMD结合了TRIE等数据挖掘技术,如频繁模式(FP-Tree),以一种从根节点到叶节点的最常见模式的方式构建模式,约束FP-Tree的生长并减少FP-Tree的搜索空间。该方法具有快速、可扩展的特点,可以从连续和非连续序列中提取模式。用真实数据集和合成数据集验证了该方法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Constraint Frequent Motif Detection in sequence datasets
The Subsequence Motif mining problem has large class of applications in the field of Bioinformatics, such as protein-protein interaction, protein Motif mining, DNA classification, web log analysis and the like. Existing algorithms detects contiguous exact and approximate patterns by restricting the user in pattern length. Though many algorithms have been solved for the related problem with poor scalability, time inefficiency, some algorithms have extracted only non-contiguous exact patterns without noise in adapting to other applications. In this paper, Constraint Frequent Motif Detection (CFMD) an algorithm is used in extracting both contiguous, non-contiguous patterns of short or long sequences of any length in biological database. CFMD combines data mining techniques such as TRIE like Frequent Pattern (FP-Tree) in constructing the patterns in such a way that most commonly occurring patterns from root to leaf node, Constraints to restrict the growth of FP-Tree and to reduce the search space of the FP-Tree. The efficiency of the proposed CFMD is fast, scalable to extract patterns from both contiguous and non-contiguous sequences. The performance of the proposed approach is proved using both real and synthetic datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信