Dependability of the K Minimum Values Sketch: Protection and Comparative Analysis

IF 3.6 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers Pub Date : 2024-10-09 DOI:10.1109/TC.2024.3475588

Jinhua Zhu;Zhen Gao;Pedro Reviriego;Shanshan Liu;Fabrizio Lombardi

{"title":"Dependability of the K Minimum Values Sketch: Protection and Comparative Analysis","authors":"Jinhua Zhu;Zhen Gao;Pedro Reviriego;Shanshan Liu;Fabrizio Lombardi","doi":"10.1109/TC.2024.3475588","DOIUrl":null,"url":null,"abstract":"A basic operation in big data analysis is to find the cardinality estimate; to estimate the cardinality at high speed and with a low memory requirement, data sketches that provide approximate estimates, are usually used. The K Minimum Value (KMV) sketch is one of the most popular options; however, soft errors on memories in KMV may substantially degrade performance. This paper is the first to consider the impact of soft errors on the KMV sketch and to compare it with HyperLogLog (HLL), another widely used sketch for cardinality estimate. Initially, the operation of KMV in the presence of soft errors (so its dependability) in the memory is studied by a theoretical analysis and simulation by error injection. The evaluation results show that errors during the construction phase of KMV may cause large deviations in the estimate results. Subsequently, based on the algorithmic features of the KMV sketch, two protection schemes are proposed. The first scheme is based on using a single parity check (SPC) to detect errors and reduce their impact on the cardinality estimate; the second scheme is based on the incremental property of the memory list in KMV. The presented evaluation shows that both schemes can dramatically improve the performance of KMV, and the SPC scheme performs better even though it requires more memory footprint and overheads in the checking operation. Finally, it is shown that soft errors on the unprotected KMV produce larger worst-case errors than in HLL, but the average impact of errors is lower; also, the protected KMV using the proposed schemes are more dependable than HLL with existing protection techniques.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"210-221"},"PeriodicalIF":3.6000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10711879/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

A basic operation in big data analysis is to find the cardinality estimate; to estimate the cardinality at high speed and with a low memory requirement, data sketches that provide approximate estimates, are usually used. The K Minimum Value (KMV) sketch is one of the most popular options; however, soft errors on memories in KMV may substantially degrade performance. This paper is the first to consider the impact of soft errors on the KMV sketch and to compare it with HyperLogLog (HLL), another widely used sketch for cardinality estimate. Initially, the operation of KMV in the presence of soft errors (so its dependability) in the memory is studied by a theoretical analysis and simulation by error injection. The evaluation results show that errors during the construction phase of KMV may cause large deviations in the estimate results. Subsequently, based on the algorithmic features of the KMV sketch, two protection schemes are proposed. The first scheme is based on using a single parity check (SPC) to detect errors and reduce their impact on the cardinality estimate; the second scheme is based on the incremental property of the memory list in KMV. The presented evaluation shows that both schemes can dramatically improve the performance of KMV, and the SPC scheme performs better even though it requires more memory footprint and overheads in the checking operation. Finally, it is shown that soft errors on the unprotected KMV produce larger worst-case errors than in HLL, but the average impact of errors is lower; also, the protected KMV using the proposed schemes are more dependable than HLL with existing protection techniques.

查看原文本刊更多论文

K最小值草图的可靠性：保护与比较分析

大数据分析中的一个基本操作是找到基数估计；为了在高速和低内存需求下估计基数，通常使用提供近似估计的数据草图。K最小值（KMV）草图是最流行的选择之一；然而，KMV中内存上的软错误可能会大大降低性能。本文首次考虑了软误差对KMV草图的影响，并将其与另一种广泛用于基数估计的草图HyperLogLog （HLL）进行了比较。首先，通过理论分析和误差注入仿真研究了存储器中存在软误差（即其可靠性）时KMV的运行。评价结果表明，KMV建设阶段的误差可能导致评价结果出现较大偏差。随后，根据KMV草图的算法特点，提出了两种保护方案。第一种方案是基于使用单个奇偶校验（SPC）来检测错误并减少它们对基数估计的影响；第二种方案是基于KMV中内存列表的增量特性。给出的评估表明，两种方案都可以显著提高KMV的性能，尽管SPC方案在检查操作中需要更多的内存占用和开销，但它的性能更好。结果表明，软误差在无保护KMV上产生的最坏情况误差大于无保护KMV，但误差的平均影响较小；此外，采用所提方案的受保护KMV比采用现有保护技术的HLL更可靠。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computers 工程技术-工程：电子与电气

CiteScore

6.60

自引率

5.40%

发文量

199

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.