Efficient mining of correlated sequential patterns based on null hypothesis

Web-KR '12 Pub Date : 2012-10-29 DOI:10.1145/2389656.2389660
C. Lin, Ming Ji, Marina Danilevsky, Jiawei Han
{"title":"Efficient mining of correlated sequential patterns based on null hypothesis","authors":"C. Lin, Ming Ji, Marina Danilevsky, Jiawei Han","doi":"10.1145/2389656.2389660","DOIUrl":null,"url":null,"abstract":"Frequent pattern mining has been a widely studied topic in the research area of data mining for more than a decade. However, pattern mining with real data sets is complicated - a huge number of co-occurrence patterns are usually generated, a majority of which are either redundant or uninformative. The true correlation relationships among data objects are buried deep among a large pile of useless information. To overcome this difficulty, mining correlations has been recognized as an important data mining task for its many advantages over mining frequent patterns.\n In this paper, we formally propose and define the task of mining frequent correlated sequential patterns from a sequential database. With this aim in mind, we re-examine various interestingness measures to select the appropriate one(s), which can disclose succinct relationships of sequential patterns. We then propose PSBSpan, an efficient mining algorithm based on the framework of the pattern-growth methodology which mines frequent correlated sequential patterns. Our experimental study on real datasets shows that our algorithm has outstanding performance in terms of both efficiency and effectiveness.","PeriodicalId":200862,"journal":{"name":"Web-KR '12","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Web-KR '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2389656.2389660","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Frequent pattern mining has been a widely studied topic in the research area of data mining for more than a decade. However, pattern mining with real data sets is complicated - a huge number of co-occurrence patterns are usually generated, a majority of which are either redundant or uninformative. The true correlation relationships among data objects are buried deep among a large pile of useless information. To overcome this difficulty, mining correlations has been recognized as an important data mining task for its many advantages over mining frequent patterns. In this paper, we formally propose and define the task of mining frequent correlated sequential patterns from a sequential database. With this aim in mind, we re-examine various interestingness measures to select the appropriate one(s), which can disclose succinct relationships of sequential patterns. We then propose PSBSpan, an efficient mining algorithm based on the framework of the pattern-growth methodology which mines frequent correlated sequential patterns. Our experimental study on real datasets shows that our algorithm has outstanding performance in terms of both efficiency and effectiveness.
基于零假设的关联序列模式高效挖掘
十多年来,频繁模式挖掘一直是数据挖掘研究领域中一个被广泛研究的课题。然而,使用真实数据集进行模式挖掘是复杂的——通常会生成大量的共现模式,其中大多数要么是冗余的,要么是无信息的。数据对象之间真正的关联关系深埋在一大堆无用的信息中。为了克服这个困难,挖掘相关性被认为是一项重要的数据挖掘任务,因为它比挖掘频繁模式有许多优点。本文正式提出并定义了从序列数据库中挖掘频繁相关序列模式的任务。考虑到这一目标,我们重新检查了各种有趣的措施,以选择合适的一个(s),它可以揭示顺序模式的简洁关系。然后,我们提出了一种基于模式生长方法框架的高效挖掘算法PSBSpan,用于挖掘频繁相关的序列模式。我们在真实数据集上的实验研究表明,我们的算法在效率和有效性方面都有出色的表现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信