灵活的近似计数

S. Mitchell, D. Day
{"title":"灵活的近似计数","authors":"S. Mitchell, D. Day","doi":"10.1145/2076623.2076655","DOIUrl":null,"url":null,"abstract":"Approximate counting [18] is useful for data stream and database summarization. It can help in many settings that allow only one pass over the data, want low memory usage, and can accept some relative error. Approximate counters use fewer bits; we focus on 8-bits but our results are general. These small counters represent a sparse sequence of larger numbers. Counters are incremented probabilistically based on the spacing between the numbers they represent. Our contributions are a customized distribution of counter values and efficient strategies for deciding when to increment them.\n At run-time, users may independently select the spacing (accuracy) of the approximate counter for small, medium, and large values. We allow the user to select the maximum number to count up to, and our algorithm will select the exponential base of the spacing. These provide additional flexibility over both classic and Csűrös's [4] floating-point approximate counting. These provide additional structure, a useful schema for users, over Kruskal and Greenberg [13].\n We describe two new and efficient strategies for incrementing approximate counters: use a deterministic countdown or sample from a geometric distribution. In Csűrös all increments are powers of two, so random bits rather than full random numbers can be used. We also provide the option to use powers-of-two but retain flexibility. We show when each strategy is fastest in our implementation.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"14 1","pages":"233-239"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Flexible approximate counting\",\"authors\":\"S. Mitchell, D. Day\",\"doi\":\"10.1145/2076623.2076655\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Approximate counting [18] is useful for data stream and database summarization. It can help in many settings that allow only one pass over the data, want low memory usage, and can accept some relative error. Approximate counters use fewer bits; we focus on 8-bits but our results are general. These small counters represent a sparse sequence of larger numbers. Counters are incremented probabilistically based on the spacing between the numbers they represent. Our contributions are a customized distribution of counter values and efficient strategies for deciding when to increment them.\\n At run-time, users may independently select the spacing (accuracy) of the approximate counter for small, medium, and large values. We allow the user to select the maximum number to count up to, and our algorithm will select the exponential base of the spacing. These provide additional flexibility over both classic and Csűrös's [4] floating-point approximate counting. These provide additional structure, a useful schema for users, over Kruskal and Greenberg [13].\\n We describe two new and efficient strategies for incrementing approximate counters: use a deterministic countdown or sample from a geometric distribution. In Csűrös all increments are powers of two, so random bits rather than full random numbers can be used. We also provide the option to use powers-of-two but retain flexibility. We show when each strategy is fastest in our implementation.\",\"PeriodicalId\":93615,\"journal\":{\"name\":\"Proceedings. International Database Engineering and Applications Symposium\",\"volume\":\"14 1\",\"pages\":\"233-239\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Database Engineering and Applications Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2076623.2076655\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Database Engineering and Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2076623.2076655","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

近似计数[18]对数据流和数据库汇总很有用。它可以在许多设置中提供帮助,这些设置只允许一次传递数据,需要低内存使用,并且可以接受一些相对错误。近似计数器使用更少的位;我们专注于8位,但我们的结果是一般的。这些小计数器表示较大数字的稀疏序列。计数器根据它们所代表的数字之间的间隔按概率递增。我们的贡献是定制计数器值的分布和决定何时增加它们的有效策略。在运行时,用户可以独立地为小、中、大值选择近似计数器的间距(精度)。我们允许用户选择要计数到的最大数字,我们的算法将选择间隔的指数基数。这为经典和Csűrös[4]的浮点近似计数提供了额外的灵活性。在Kruskal和Greenberg[13]之上,这些提供了额外的结构,为用户提供了有用的模式。我们描述了增加近似计数器的两种新的有效策略:使用确定性倒计时或来自几何分布的样本。在Csűrös中,所有的增量都是2的幂,所以可以使用随机位而不是全随机数。我们还提供了使用2的幂的选项,但保留了灵活性。我们展示了每个策略在我们的实施中何时是最快的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Flexible approximate counting
Approximate counting [18] is useful for data stream and database summarization. It can help in many settings that allow only one pass over the data, want low memory usage, and can accept some relative error. Approximate counters use fewer bits; we focus on 8-bits but our results are general. These small counters represent a sparse sequence of larger numbers. Counters are incremented probabilistically based on the spacing between the numbers they represent. Our contributions are a customized distribution of counter values and efficient strategies for deciding when to increment them. At run-time, users may independently select the spacing (accuracy) of the approximate counter for small, medium, and large values. We allow the user to select the maximum number to count up to, and our algorithm will select the exponential base of the spacing. These provide additional flexibility over both classic and Csűrös's [4] floating-point approximate counting. These provide additional structure, a useful schema for users, over Kruskal and Greenberg [13]. We describe two new and efficient strategies for incrementing approximate counters: use a deterministic countdown or sample from a geometric distribution. In Csűrös all increments are powers of two, so random bits rather than full random numbers can be used. We also provide the option to use powers-of-two but retain flexibility. We show when each strategy is fastest in our implementation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信