Efficient mining of incremental high utility patterns with negative unit profits over all the accumulated stream data

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-06-13 DOI:10.1016/j.knosys.2025.113956

Doyoung Kim , Heonho Kim , Seungwan Park , Hanju Kim , Myungha Cho , Seongbin Park , Taewoong Ryu, Chanhee Lee, Hyeonmo Kim, Unil Yun

{"title":"Efficient mining of incremental high utility patterns with negative unit profits over all the accumulated stream data","authors":"Doyoung Kim , Heonho Kim , Seungwan Park , Hanju Kim , Myungha Cho , Seongbin Park , Taewoong Ryu, Chanhee Lee, Hyeonmo Kim, Unil Yun","doi":"10.1016/j.knosys.2025.113956","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional high utility pattern mining had considered that items in databases have positive unit profits, but considering negative unit profits is often required in real life. Thus, many algorithms considering both positive and negative unit profits have been proposed in static data environments. Meanwhile, one of the most important parts of data analysis is how to handle the accumulated stream data in real-world systems. However, existing methods considering negative unit profits in a static environment are inadequate for processing data streams, as they require repeated data access, incurring additional resources with multiple data scans. This paper suggests an effective method considering positive and negative unit profits and dynamic databases for high utility stream pattern mining. To avoid storing data in memory and scanning it multiple times, the proposed approach constructs its data structure by performing a single scan of the incremental data without storing it in the memory. Then, through a reconstruction process, it efficiently integrates and manages the new data while optimally maintaining the structures. This methodology enables efficient mining without the loss of significant patterns. Experiments with real and synthetic datasets show that the proposed approach has improved performance to state-of-the-art methods, including adjusted approaches, regarding runtime, memory usage, and scalability. In addition, the proposed method demonstrates enhanced performance than the baseline method in terms of the resources of each process and the number of incremental databases. Further statistical evaluation of the accuracy test shows that the proposed method extracts results without pattern loss or duplication.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"325 ","pages":"Article 113956"},"PeriodicalIF":7.6000,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125010019","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional high utility pattern mining had considered that items in databases have positive unit profits, but considering negative unit profits is often required in real life. Thus, many algorithms considering both positive and negative unit profits have been proposed in static data environments. Meanwhile, one of the most important parts of data analysis is how to handle the accumulated stream data in real-world systems. However, existing methods considering negative unit profits in a static environment are inadequate for processing data streams, as they require repeated data access, incurring additional resources with multiple data scans. This paper suggests an effective method considering positive and negative unit profits and dynamic databases for high utility stream pattern mining. To avoid storing data in memory and scanning it multiple times, the proposed approach constructs its data structure by performing a single scan of the incremental data without storing it in the memory. Then, through a reconstruction process, it efficiently integrates and manages the new data while optimally maintaining the structures. This methodology enables efficient mining without the loss of significant patterns. Experiments with real and synthetic datasets show that the proposed approach has improved performance to state-of-the-art methods, including adjusted approaches, regarding runtime, memory usage, and scalability. In addition, the proposed method demonstrates enhanced performance than the baseline method in terms of the resources of each process and the number of incremental databases. Further statistical evaluation of the accuracy test shows that the proposed method extracts results without pattern loss or duplication.

查看原文本刊更多论文

在所有累积的流数据上有效挖掘具有负单位利润的增量高效用模式

传统的高效用模式挖掘认为数据库中的项目具有正的单位利润，但在现实生活中往往需要考虑负的单位利润。因此，在静态数据环境中提出了许多考虑正和负单位利润的算法。同时，如何处理现实系统中积累的流数据是数据分析的重要组成部分之一。然而，在静态环境中考虑负单位利润的现有方法不足以处理数据流，因为它们需要重复访问数据，在多次数据扫描中产生额外的资源。本文提出了一种考虑正、负单位利润和动态数据库的高效流模式挖掘方法。为了避免将数据存储在内存中并进行多次扫描，该方法通过对增量数据执行一次扫描而不将其存储在内存中来构建其数据结构。然后，通过重建过程，有效地整合和管理新数据，同时优化维护结构。这种方法可以在不丢失重要模式的情况下进行有效的挖掘。对真实数据集和合成数据集的实验表明，所提出的方法在运行时、内存使用和可扩展性方面优于最先进的方法，包括调整后的方法。此外，就每个进程的资源和增量数据库的数量而言，所提出的方法比基线方法表现出更高的性能。进一步的精度测试统计评估表明，该方法提取结果没有模式丢失和重复。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.