{"title":"Dependability of the K Minimum Values Sketch: Protection and Comparative Analysis","authors":"Jinhua Zhu;Zhen Gao;Pedro Reviriego;Shanshan Liu;Fabrizio Lombardi","doi":"10.1109/TC.2024.3475588","DOIUrl":null,"url":null,"abstract":"A basic operation in big data analysis is to find the cardinality estimate; to estimate the cardinality at high speed and with a low memory requirement, data sketches that provide approximate estimates, are usually used. The K Minimum Value (KMV) sketch is one of the most popular options; however, soft errors on memories in KMV may substantially degrade performance. This paper is the first to consider the impact of soft errors on the KMV sketch and to compare it with HyperLogLog (HLL), another widely used sketch for cardinality estimate. Initially, the operation of KMV in the presence of soft errors (so its dependability) in the memory is studied by a theoretical analysis and simulation by error injection. The evaluation results show that errors during the construction phase of KMV may cause large deviations in the estimate results. Subsequently, based on the algorithmic features of the KMV sketch, two protection schemes are proposed. The first scheme is based on using a single parity check (SPC) to detect errors and reduce their impact on the cardinality estimate; the second scheme is based on the incremental property of the memory list in KMV. The presented evaluation shows that both schemes can dramatically improve the performance of KMV, and the SPC scheme performs better even though it requires more memory footprint and overheads in the checking operation. Finally, it is shown that soft errors on the unprotected KMV produce larger worst-case errors than in HLL, but the average impact of errors is lower; also, the protected KMV using the proposed schemes are more dependable than HLL with existing protection techniques.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"210-221"},"PeriodicalIF":3.6000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10711879/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
A basic operation in big data analysis is to find the cardinality estimate; to estimate the cardinality at high speed and with a low memory requirement, data sketches that provide approximate estimates, are usually used. The K Minimum Value (KMV) sketch is one of the most popular options; however, soft errors on memories in KMV may substantially degrade performance. This paper is the first to consider the impact of soft errors on the KMV sketch and to compare it with HyperLogLog (HLL), another widely used sketch for cardinality estimate. Initially, the operation of KMV in the presence of soft errors (so its dependability) in the memory is studied by a theoretical analysis and simulation by error injection. The evaluation results show that errors during the construction phase of KMV may cause large deviations in the estimate results. Subsequently, based on the algorithmic features of the KMV sketch, two protection schemes are proposed. The first scheme is based on using a single parity check (SPC) to detect errors and reduce their impact on the cardinality estimate; the second scheme is based on the incremental property of the memory list in KMV. The presented evaluation shows that both schemes can dramatically improve the performance of KMV, and the SPC scheme performs better even though it requires more memory footprint and overheads in the checking operation. Finally, it is shown that soft errors on the unprotected KMV produce larger worst-case errors than in HLL, but the average impact of errors is lower; also, the protected KMV using the proposed schemes are more dependable than HLL with existing protection techniques.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.