基于滑动窗口的高效用占用模式数据流挖掘

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-04-28 DOI:10.1016/j.ins.2025.122243

Seungwan Park , Taewoong Ryu , Doyoon Kim , Doyoung Kim , Hanju Kim , Myungha Cho , Unil Yun

{"title":"基于滑动窗口的高效用占用模式数据流挖掘","authors":"Seungwan Park , Taewoong Ryu , Doyoon Kim , Doyoung Kim , Hanju Kim , Myungha Cho , Unil Yun","doi":"10.1016/j.ins.2025.122243","DOIUrl":null,"url":null,"abstract":"<div><div>High utility-based pattern mining has been proposed to analyze information by considering not only the frequency of items but also their quantity and profit. Among these, studies on high utility occupancy-based patterns have emerged, which consider the occupancy measure reflecting the share of a pattern belonging to transactions. Furthermore, as the necessity to process real-time stream data has become more critical, a method to discover high utility occupancy-based patterns in stream information has been presented recently. However, this recent method handles all accumulated data on data stream environments. Since all previously accumulated data are processed, the volume of data to be processed steadily increases over time, leading to a decline in efficiency over time. In addition, it becomes difficult to give emphasis on recent data. Consequently, these methods become less suitable for practical applications. To surmount the drawbacks, we introduce a novel approach for mining high utility occupancy patterns, employing a sliding window technique to efficiently process stream data. By focusing on fixed-size, most recent data within the window, our method effectively reflects the trends in the latest data while exhibiting improved efficiency compared to previous approaches. Extensive performance evaluations demonstrate the efficacy of the proposed method against prior methods regarding runtime, memory usage, scalability, and sensitivity. Moreover, statistical tests confirm that our approach accurately extracts the exact number of patterns without pattern loss or duplication.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"716 ","pages":"Article 122243"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sliding window-based high utility occupancy pattern mining for data streams\",\"authors\":\"Seungwan Park , Taewoong Ryu , Doyoon Kim , Doyoung Kim , Hanju Kim , Myungha Cho , Unil Yun\",\"doi\":\"10.1016/j.ins.2025.122243\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>High utility-based pattern mining has been proposed to analyze information by considering not only the frequency of items but also their quantity and profit. Among these, studies on high utility occupancy-based patterns have emerged, which consider the occupancy measure reflecting the share of a pattern belonging to transactions. Furthermore, as the necessity to process real-time stream data has become more critical, a method to discover high utility occupancy-based patterns in stream information has been presented recently. However, this recent method handles all accumulated data on data stream environments. Since all previously accumulated data are processed, the volume of data to be processed steadily increases over time, leading to a decline in efficiency over time. In addition, it becomes difficult to give emphasis on recent data. Consequently, these methods become less suitable for practical applications. To surmount the drawbacks, we introduce a novel approach for mining high utility occupancy patterns, employing a sliding window technique to efficiently process stream data. By focusing on fixed-size, most recent data within the window, our method effectively reflects the trends in the latest data while exhibiting improved efficiency compared to previous approaches. Extensive performance evaluations demonstrate the efficacy of the proposed method against prior methods regarding runtime, memory usage, scalability, and sensitivity. Moreover, statistical tests confirm that our approach accurately extracts the exact number of patterns without pattern loss or duplication.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"716 \",\"pages\":\"Article 122243\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525003755\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525003755","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

基于高效用的模式挖掘不仅考虑项目的频率，还考虑项目的数量和利润。其中，出现了基于高效用占用率模式的研究，该研究认为占用率指标反映了属于交易模式的份额。此外，由于处理实时流数据的必要性变得越来越重要，最近提出了一种发现流信息中基于占用率的高效用模式的方法。但是，这个最新的方法可以处理数据流环境中所有累积的数据。由于所有以前积累的数据都会被处理，因此要处理的数据量会随着时间的推移而稳步增加，从而导致效率随着时间的推移而下降。此外，很难强调最近的数据。因此，这些方法变得不太适合实际应用。为了克服这些缺点，我们引入了一种新的方法来挖掘高效用占用模式，采用滑动窗口技术来有效地处理流数据。通过关注窗口内固定大小的最新数据，我们的方法有效地反映了最新数据的趋势，同时与以前的方法相比，效率有所提高。广泛的性能评估证明了所提出的方法在运行时、内存使用、可伸缩性和灵敏度方面比先前的方法更有效。此外，统计测试证实了我们的方法准确地提取了准确数量的模式，没有模式丢失或重复。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sliding window-based high utility occupancy pattern mining for data streams

High utility-based pattern mining has been proposed to analyze information by considering not only the frequency of items but also their quantity and profit. Among these, studies on high utility occupancy-based patterns have emerged, which consider the occupancy measure reflecting the share of a pattern belonging to transactions. Furthermore, as the necessity to process real-time stream data has become more critical, a method to discover high utility occupancy-based patterns in stream information has been presented recently. However, this recent method handles all accumulated data on data stream environments. Since all previously accumulated data are processed, the volume of data to be processed steadily increases over time, leading to a decline in efficiency over time. In addition, it becomes difficult to give emphasis on recent data. Consequently, these methods become less suitable for practical applications. To surmount the drawbacks, we introduce a novel approach for mining high utility occupancy patterns, employing a sliding window technique to efficiently process stream data. By focusing on fixed-size, most recent data within the window, our method effectively reflects the trends in the latest data while exhibiting improved efficiency compared to previous approaches. Extensive performance evaluations demonstrate the efficacy of the proposed method against prior methods regarding runtime, memory usage, scalability, and sensitivity. Moreover, statistical tests confirm that our approach accurately extracts the exact number of patterns without pattern loss or duplication.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.