Formalization of Randomized Approximation Algorithms for Frequency Moments

Emin Karayel
{"title":"Formalization of Randomized Approximation Algorithms for Frequency Moments","authors":"Emin Karayel","doi":"10.4230/LIPIcs.ITP.2022.21","DOIUrl":null,"url":null,"abstract":"In 1999 Alon et al. introduced the still active research topic of approximating the frequency moments of a data stream using randomized algorithms with minimal space usage. This includes the problem of estimating the cardinality of the stream elements – the zeroth frequency moment. Higher-order frequency moments provide information about the skew of the data stream which is, for example, critical information for parallel processing. (The k -th frequency moment of a data stream is the sum of the k -th powers of the occurrence counts of each element in the stream.) They introduce both lower bounds and upper bounds on the space complexity of the problems, which were later improved by newer publications. The algorithms have guaranteed success probabilities and accuracies without making any assumptions on the input distribution. They are an interesting use case for formal verification because their correctness proofs require a large body of deep results from algebra, analysis and probability theory. This work reports on the formal verification of three algorithms for the approximation of F 0 , F 2 and F k for k ≥ 3. The results include the identification of significantly simpler algorithms with the same runtime and space complexities as the previously known ones as well as the development of several reusable components, such as a formalization of universal hash families, amplification methods for randomized algorithms, a model for one-pass data stream algorithms or a generic flexible encoding library for the verification of space complexities.","PeriodicalId":280633,"journal":{"name":"Arch. Formal Proofs","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Arch. Formal Proofs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.ITP.2022.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In 1999 Alon et al. introduced the still active research topic of approximating the frequency moments of a data stream using randomized algorithms with minimal space usage. This includes the problem of estimating the cardinality of the stream elements – the zeroth frequency moment. Higher-order frequency moments provide information about the skew of the data stream which is, for example, critical information for parallel processing. (The k -th frequency moment of a data stream is the sum of the k -th powers of the occurrence counts of each element in the stream.) They introduce both lower bounds and upper bounds on the space complexity of the problems, which were later improved by newer publications. The algorithms have guaranteed success probabilities and accuracies without making any assumptions on the input distribution. They are an interesting use case for formal verification because their correctness proofs require a large body of deep results from algebra, analysis and probability theory. This work reports on the formal verification of three algorithms for the approximation of F 0 , F 2 and F k for k ≥ 3. The results include the identification of significantly simpler algorithms with the same runtime and space complexities as the previously known ones as well as the development of several reusable components, such as a formalization of universal hash families, amplification methods for randomized algorithms, a model for one-pass data stream algorithms or a generic flexible encoding library for the verification of space complexities.
频率矩随机逼近算法的形式化
1999年,Alon等人引入了一个仍然活跃的研究课题,即使用最小空间占用的随机算法来逼近数据流的频率矩。这包括估计流元素基数的问题——第零频率矩。高阶频率矩提供了关于数据流倾斜的信息,例如,这是并行处理的关键信息。(数据流的第k次频率矩是数据流中每个元素的出现次数的k次幂的和。)他们引入了问题空间复杂性的下界和上界,后来被较新的出版物改进。该算法保证了成功概率和准确性,而不需要对输入分布进行任何假设。它们是形式化验证的一个有趣用例,因为它们的正确性证明需要大量来自代数、分析和概率论的深入结果。本文报道了对k≥3时f0、f2和fk的三种近似算法的形式化验证。结果包括识别出与以前已知的算法具有相同运行时和空间复杂性的更简单的算法,以及开发几个可重用组件,例如通用哈希族的形式化,随机算法的放大方法,一次通过数据流算法的模型或用于验证空间复杂性的通用灵活编码库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信