Extraction of Ambiguous Sequential Patterns with Least Minimum Generalization from Mismatch Clusters

2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System Pub Date : 2007-12-16 DOI:10.1109/SITIS.2007.104

Kotaro Araki, Keiichi Tamura, Tomoyuki Kato, Y. Mori, H. Kitakami

{"title":"Extraction of Ambiguous Sequential Patterns with Least Minimum Generalization from Mismatch Clusters","authors":"Kotaro Araki, Keiichi Tamura, Tomoyuki Kato, Y. Mori, H. Kitakami","doi":"10.1109/SITIS.2007.104","DOIUrl":null,"url":null,"abstract":"An ambiguous query in sequence databases returns a set of similar subsequences, called a mismatch cluster, to the user. The inherent problem is that it is difficult for users to identify the characteristics of very large similar subsequences in a mismatch cluster. In order to support user comprehension of mismatch clusters, it is important to extract a set of ambiguous sequence patterns with the least minimum generalization in the mismatch cluster. The extraction of the ambiguous sequential pattern set requires an enormous amount of computational time, since we have to discover generalized patterns with minimum covers for the mismatch cluster from candidate generalized patterns. The present paper is a proposal for an iterative refinement method to extract ambiguous sequence patterns with minimum cover for mismatch clusters selected from a sequence database. It includes a proposal to use the method with a domain segmentation method to achieve an efficient pattern extraction. Moreover, a prototype implementing the two proposed methods has been applied to three datasets included in PROSITE in order to evaluate their usefulness. The proposed methods resulted in a high capability to extract ambiguous sequential patterns from mismatch clusters that are provided by an ambiguous query in the sequence database.","PeriodicalId":234433,"journal":{"name":"2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System","volume":"146 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SITIS.2007.104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

An ambiguous query in sequence databases returns a set of similar subsequences, called a mismatch cluster, to the user. The inherent problem is that it is difficult for users to identify the characteristics of very large similar subsequences in a mismatch cluster. In order to support user comprehension of mismatch clusters, it is important to extract a set of ambiguous sequence patterns with the least minimum generalization in the mismatch cluster. The extraction of the ambiguous sequential pattern set requires an enormous amount of computational time, since we have to discover generalized patterns with minimum covers for the mismatch cluster from candidate generalized patterns. The present paper is a proposal for an iterative refinement method to extract ambiguous sequence patterns with minimum cover for mismatch clusters selected from a sequence database. It includes a proposal to use the method with a domain segmentation method to achieve an efficient pattern extraction. Moreover, a prototype implementing the two proposed methods has been applied to three datasets included in PROSITE in order to evaluate their usefulness. The proposed methods resulted in a high capability to extract ambiguous sequential patterns from mismatch clusters that are provided by an ambiguous query in the sequence database.

查看原文本刊更多论文

基于最小概化的错配簇模糊序列模式提取

序列数据库中的歧义查询向用户返回一组相似的子序列，称为不匹配集群。固有的问题是用户很难识别错配簇中非常大的相似子序列的特征。为了支持用户对错配聚类的理解，重要的是在错配聚类中提取一组泛化最小的模糊序列模式。模糊序列模式集的提取需要大量的计算时间，因为我们必须从候选广义模式中发现覆盖范围最小的不匹配簇的广义模式。本文提出了一种迭代改进方法，用于从序列数据库中选择的不匹配簇中提取具有最小覆盖范围的模糊序列模式。提出了将该方法与领域分割方法结合起来，实现高效的模式提取。此外，实现这两种方法的原型已应用于PROSITE中包含的三个数据集，以评估它们的有效性。所提出的方法能够从序列数据库中的模糊查询提供的不匹配簇中提取出模糊的序列模式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System

自引率

0.00%

发文量