SandwichSketch: A More Accurate Sketch for Frequent Object Mining in Data Streams

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-09-09 DOI:10.1109/TKDE.2025.3607691

Zhuochen Fan;Ruixin Wang;Zihan Jiang;Ruwen Zhang;Tong Yang;Sha Wang;Yuhan Wu;Ruijie Miao;Kaicheng Yang;Bui Cui

{"title":"SandwichSketch: A More Accurate Sketch for Frequent Object Mining in Data Streams","authors":"Zhuochen Fan;Ruixin Wang;Zihan Jiang;Ruwen Zhang;Tong Yang;Sha Wang;Yuhan Wu;Ruijie Miao;Kaicheng Yang;Bui Cui","doi":"10.1109/TKDE.2025.3607691","DOIUrl":null,"url":null,"abstract":"Frequent object mining has gained considerable interest in the research community and can be split into frequent item mining and frequent set mining depending on the type of object. While existing sketch-based algorithms have made significant progress in addressing these two tasks concurrently, they also possess notable limitations. They either support only software platforms with low throughput or compromise accuracy for faster processing speed and better hardware compatibility. In this paper, we make a substantial stride towards supporting frequent object mining by designing SandwichSketch, which draws inspiration from sandwich making and proposes two techniques including the double fidelity enhancement and hierarchical hot locking to guarantee high fidelity on both two tasks. We implement SandwichSketch on three platforms (CPU, Redis, and FPGA) and show that it enhances accuracy by <inline-formula><tex-math>$38.4\\times$</tex-math></inline-formula> and <inline-formula><tex-math>$5\\times$</tex-math></inline-formula> for two tasks on three real-world datasets, respectively. Additionally, it supports a distributed measurement scenario with less than a 0.01% decrease in Average Relative Error (ARE) when the number of nodes increases from 1 to 16.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 11","pages":"6636-6650"},"PeriodicalIF":10.4000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11154063/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Frequent object mining has gained considerable interest in the research community and can be split into frequent item mining and frequent set mining depending on the type of object. While existing sketch-based algorithms have made significant progress in addressing these two tasks concurrently, they also possess notable limitations. They either support only software platforms with low throughput or compromise accuracy for faster processing speed and better hardware compatibility. In this paper, we make a substantial stride towards supporting frequent object mining by designing SandwichSketch, which draws inspiration from sandwich making and proposes two techniques including the double fidelity enhancement and hierarchical hot locking to guarantee high fidelity on both two tasks. We implement SandwichSketch on three platforms (CPU, Redis, and FPGA) and show that it enhances accuracy by

$38.4\times$

and

$5\times$

for two tasks on three real-world datasets, respectively. Additionally, it supports a distributed measurement scenario with less than a 0.01% decrease in Average Relative Error (ARE) when the number of nodes increases from 1 to 16.

查看原文本刊更多论文

SandwichSketch：数据流中频繁对象挖掘的更精确的草图

频繁对象挖掘在研究界引起了相当大的兴趣，根据对象的类型可以分为频繁项挖掘和频繁集挖掘。虽然现有的基于草图的算法在同时处理这两个任务方面取得了重大进展，但它们也具有明显的局限性。它们要么只支持低吞吐量的软件平台，要么为了更快的处理速度和更好的硬件兼容性而牺牲精度。在本文中，我们通过设计SandwichSketch在支持频繁对象挖掘方面迈出了坚实的一步，该设计从三明治制作中获得灵感，并提出了两种技术，包括双保真度增强和分层热锁定，以保证两种任务的高保真度。我们在三个平台（CPU、Redis和FPGA）上实现了SandwichSketch，并表明它在三个真实数据集上分别为两个任务提高了38.4倍和5倍的准确性。此外，它还支持分布式测量场景，当节点数从1增加到16时，平均相对误差（Average Relative Error， ARE）下降小于0.01%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.