Scalable Contrast Pattern Mining over Data Streams

E. Chavary, S. Erfani, C. Leckie
{"title":"Scalable Contrast Pattern Mining over Data Streams","authors":"E. Chavary, S. Erfani, C. Leckie","doi":"10.1145/3459637.3482174","DOIUrl":null,"url":null,"abstract":"Incremental contrast pattern mining (CPM) is an important task in various fields such as network traffic analysis, medical diagnosis, and customer behavior analysis. Due to increases in the speed and dimension of data streams, a major challenge for CPM is to deal with the huge number of generated candidate patterns. While there are some works on incremental CPM, their approaches are not scalable in dense and high dimensional data streams, and the problem of CPM over an evolving dataset is an open challenge. In this work we focus on extracting the most specific set of contrast patterns (CPs) to discover significant changes between two data streams. We devise a novel algorithm to extract CPs using previously mined patterns instead of generating all patterns in each window from scratch. Our experimental results on a wide variety of datasets demonstrate the advantages of our approach over the state of the art in terms of efficiency.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459637.3482174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Incremental contrast pattern mining (CPM) is an important task in various fields such as network traffic analysis, medical diagnosis, and customer behavior analysis. Due to increases in the speed and dimension of data streams, a major challenge for CPM is to deal with the huge number of generated candidate patterns. While there are some works on incremental CPM, their approaches are not scalable in dense and high dimensional data streams, and the problem of CPM over an evolving dataset is an open challenge. In this work we focus on extracting the most specific set of contrast patterns (CPs) to discover significant changes between two data streams. We devise a novel algorithm to extract CPs using previously mined patterns instead of generating all patterns in each window from scratch. Our experimental results on a wide variety of datasets demonstrate the advantages of our approach over the state of the art in terms of efficiency.
数据流上的可伸缩对比模式挖掘
增量对比模式挖掘(CPM)是网络流量分析、医疗诊断和客户行为分析等领域的一项重要任务。由于数据流的速度和维度的增加,CPM面临的一个主要挑战是处理生成的大量候选模式。虽然有一些关于增量CPM的工作,但他们的方法在密集和高维数据流中是不可扩展的,并且在不断发展的数据集上的CPM问题是一个开放的挑战。在这项工作中,我们专注于提取最具体的对比模式(CPs)集,以发现两个数据流之间的重大变化。我们设计了一种新的算法,使用先前挖掘的模式来提取CPs,而不是从头开始生成每个窗口中的所有模式。我们在各种各样的数据集上的实验结果表明,我们的方法在效率方面优于目前最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信