Extraction of Ambiguous Sequential Patterns with Least Minimum Generalization from Mismatch Clusters

Kotaro Araki, Keiichi Tamura, Tomoyuki Kato, Y. Mori, H. Kitakami
{"title":"Extraction of Ambiguous Sequential Patterns with Least Minimum Generalization from Mismatch Clusters","authors":"Kotaro Araki, Keiichi Tamura, Tomoyuki Kato, Y. Mori, H. Kitakami","doi":"10.1109/SITIS.2007.104","DOIUrl":null,"url":null,"abstract":"An ambiguous query in sequence databases returns a set of similar subsequences, called a mismatch cluster, to the user. The inherent problem is that it is difficult for users to identify the characteristics of very large similar subsequences in a mismatch cluster. In order to support user comprehension of mismatch clusters, it is important to extract a set of ambiguous sequence patterns with the least minimum generalization in the mismatch cluster. The extraction of the ambiguous sequential pattern set requires an enormous amount of computational time, since we have to discover generalized patterns with minimum covers for the mismatch cluster from candidate generalized patterns. The present paper is a proposal for an iterative refinement method to extract ambiguous sequence patterns with minimum cover for mismatch clusters selected from a sequence database. It includes a proposal to use the method with a domain segmentation method to achieve an efficient pattern extraction. Moreover, a prototype implementing the two proposed methods has been applied to three datasets included in PROSITE in order to evaluate their usefulness. The proposed methods resulted in a high capability to extract ambiguous sequential patterns from mismatch clusters that are provided by an ambiguous query in the sequence database.","PeriodicalId":234433,"journal":{"name":"2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SITIS.2007.104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

An ambiguous query in sequence databases returns a set of similar subsequences, called a mismatch cluster, to the user. The inherent problem is that it is difficult for users to identify the characteristics of very large similar subsequences in a mismatch cluster. In order to support user comprehension of mismatch clusters, it is important to extract a set of ambiguous sequence patterns with the least minimum generalization in the mismatch cluster. The extraction of the ambiguous sequential pattern set requires an enormous amount of computational time, since we have to discover generalized patterns with minimum covers for the mismatch cluster from candidate generalized patterns. The present paper is a proposal for an iterative refinement method to extract ambiguous sequence patterns with minimum cover for mismatch clusters selected from a sequence database. It includes a proposal to use the method with a domain segmentation method to achieve an efficient pattern extraction. Moreover, a prototype implementing the two proposed methods has been applied to three datasets included in PROSITE in order to evaluate their usefulness. The proposed methods resulted in a high capability to extract ambiguous sequential patterns from mismatch clusters that are provided by an ambiguous query in the sequence database.
基于最小概化的错配簇模糊序列模式提取
序列数据库中的歧义查询向用户返回一组相似的子序列,称为不匹配集群。固有的问题是用户很难识别错配簇中非常大的相似子序列的特征。为了支持用户对错配聚类的理解,重要的是在错配聚类中提取一组泛化最小的模糊序列模式。模糊序列模式集的提取需要大量的计算时间,因为我们必须从候选广义模式中发现覆盖范围最小的不匹配簇的广义模式。本文提出了一种迭代改进方法,用于从序列数据库中选择的不匹配簇中提取具有最小覆盖范围的模糊序列模式。提出了将该方法与领域分割方法结合起来,实现高效的模式提取。此外,实现这两种方法的原型已应用于PROSITE中包含的三个数据集,以评估它们的有效性。所提出的方法能够从序列数据库中的模糊查询提供的不匹配簇中提取出模糊的序列模式。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信