基于滑动窗口的高效用占用模式数据流挖掘

IF 8.1 1区 计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS
Seungwan Park , Taewoong Ryu , Doyoon Kim , Doyoung Kim , Hanju Kim , Myungha Cho , Unil Yun
{"title":"基于滑动窗口的高效用占用模式数据流挖掘","authors":"Seungwan Park ,&nbsp;Taewoong Ryu ,&nbsp;Doyoon Kim ,&nbsp;Doyoung Kim ,&nbsp;Hanju Kim ,&nbsp;Myungha Cho ,&nbsp;Unil Yun","doi":"10.1016/j.ins.2025.122243","DOIUrl":null,"url":null,"abstract":"<div><div>High utility-based pattern mining has been proposed to analyze information by considering not only the frequency of items but also their quantity and profit. Among these, studies on high utility occupancy-based patterns have emerged, which consider the occupancy measure reflecting the share of a pattern belonging to transactions. Furthermore, as the necessity to process real-time stream data has become more critical, a method to discover high utility occupancy-based patterns in stream information has been presented recently. However, this recent method handles all accumulated data on data stream environments. Since all previously accumulated data are processed, the volume of data to be processed steadily increases over time, leading to a decline in efficiency over time. In addition, it becomes difficult to give emphasis on recent data. Consequently, these methods become less suitable for practical applications. To surmount the drawbacks, we introduce a novel approach for mining high utility occupancy patterns, employing a sliding window technique to efficiently process stream data. By focusing on fixed-size, most recent data within the window, our method effectively reflects the trends in the latest data while exhibiting improved efficiency compared to previous approaches. Extensive performance evaluations demonstrate the efficacy of the proposed method against prior methods regarding runtime, memory usage, scalability, and sensitivity. Moreover, statistical tests confirm that our approach accurately extracts the exact number of patterns without pattern loss or duplication.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"716 ","pages":"Article 122243"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sliding window-based high utility occupancy pattern mining for data streams\",\"authors\":\"Seungwan Park ,&nbsp;Taewoong Ryu ,&nbsp;Doyoon Kim ,&nbsp;Doyoung Kim ,&nbsp;Hanju Kim ,&nbsp;Myungha Cho ,&nbsp;Unil Yun\",\"doi\":\"10.1016/j.ins.2025.122243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>High utility-based pattern mining has been proposed to analyze information by considering not only the frequency of items but also their quantity and profit. Among these, studies on high utility occupancy-based patterns have emerged, which consider the occupancy measure reflecting the share of a pattern belonging to transactions. Furthermore, as the necessity to process real-time stream data has become more critical, a method to discover high utility occupancy-based patterns in stream information has been presented recently. However, this recent method handles all accumulated data on data stream environments. Since all previously accumulated data are processed, the volume of data to be processed steadily increases over time, leading to a decline in efficiency over time. In addition, it becomes difficult to give emphasis on recent data. Consequently, these methods become less suitable for practical applications. To surmount the drawbacks, we introduce a novel approach for mining high utility occupancy patterns, employing a sliding window technique to efficiently process stream data. By focusing on fixed-size, most recent data within the window, our method effectively reflects the trends in the latest data while exhibiting improved efficiency compared to previous approaches. Extensive performance evaluations demonstrate the efficacy of the proposed method against prior methods regarding runtime, memory usage, scalability, and sensitivity. Moreover, statistical tests confirm that our approach accurately extracts the exact number of patterns without pattern loss or duplication.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"716 \",\"pages\":\"Article 122243\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525003755\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525003755","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

基于高效用的模式挖掘不仅考虑项目的频率,还考虑项目的数量和利润。其中,出现了基于高效用占用率模式的研究,该研究认为占用率指标反映了属于交易模式的份额。此外,由于处理实时流数据的必要性变得越来越重要,最近提出了一种发现流信息中基于占用率的高效用模式的方法。但是,这个最新的方法可以处理数据流环境中所有累积的数据。由于所有以前积累的数据都会被处理,因此要处理的数据量会随着时间的推移而稳步增加,从而导致效率随着时间的推移而下降。此外,很难强调最近的数据。因此,这些方法变得不太适合实际应用。为了克服这些缺点,我们引入了一种新的方法来挖掘高效用占用模式,采用滑动窗口技术来有效地处理流数据。通过关注窗口内固定大小的最新数据,我们的方法有效地反映了最新数据的趋势,同时与以前的方法相比,效率有所提高。广泛的性能评估证明了所提出的方法在运行时、内存使用、可伸缩性和灵敏度方面比先前的方法更有效。此外,统计测试证实了我们的方法准确地提取了准确数量的模式,没有模式丢失或重复。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sliding window-based high utility occupancy pattern mining for data streams
High utility-based pattern mining has been proposed to analyze information by considering not only the frequency of items but also their quantity and profit. Among these, studies on high utility occupancy-based patterns have emerged, which consider the occupancy measure reflecting the share of a pattern belonging to transactions. Furthermore, as the necessity to process real-time stream data has become more critical, a method to discover high utility occupancy-based patterns in stream information has been presented recently. However, this recent method handles all accumulated data on data stream environments. Since all previously accumulated data are processed, the volume of data to be processed steadily increases over time, leading to a decline in efficiency over time. In addition, it becomes difficult to give emphasis on recent data. Consequently, these methods become less suitable for practical applications. To surmount the drawbacks, we introduce a novel approach for mining high utility occupancy patterns, employing a sliding window technique to efficiently process stream data. By focusing on fixed-size, most recent data within the window, our method effectively reflects the trends in the latest data while exhibiting improved efficiency compared to previous approaches. Extensive performance evaluations demonstrate the efficacy of the proposed method against prior methods regarding runtime, memory usage, scalability, and sensitivity. Moreover, statistical tests confirm that our approach accurately extracts the exact number of patterns without pattern loss or duplication.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Sciences
Information Sciences 工程技术-计算机:信息系统
CiteScore
14.00
自引率
17.30%
发文量
1322
审稿时长
10.4 months
期刊介绍: Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信