数据流中具有见证的频繁元素

C. Konrad
{"title":"数据流中具有见证的频繁元素","authors":"C. Konrad","doi":"10.1145/3452021.3458330","DOIUrl":null,"url":null,"abstract":"Detecting frequent elements is among the oldest and most-studied problems in the area of data streams. Given a stream of m data items in \\1, 2, \\dots, n\\, the objective is to output items that appear at least d times, for some threshold parameter d, and provably optimal algorithms are known today. However, in many applications, knowing only the frequent elements themselves is not enough: For example, an Internet router may not only need to know the most frequent destination IP addresses of forwarded packages, but also the timestamps of when these packages appeared or any other meta-data that \"arrived'' with the packages, e.g., their source IP addresses. In this paper, we introduce the witness version of the frequent elements problem: Given a desired approximation guarantee α \\ge 1$ and a desired frequency $d łe Δ$, where Δ is the frequency of the most frequent item, the objective is to report an item together with at least $d / α$ timestamps of when the item appeared in the stream (or any other meta-data that arrived with the items). We give provably optimal algorithms for both the insertion-only and insertion-deletion stream settings: In insertion-only streams, we show that space $\\tildeO (n + d \\cdot n^\\frac1 α )$ is necessary and sufficient for every integral $1 łe α łe łog n$. In insertion-deletion streams, we show that space $\\tildeO (\\fracn \\cdot d α^2 )$ is necessary and sufficient, for every α łe \\sqrtn $.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Frequent Elements with Witnesses in Data Streams\",\"authors\":\"C. Konrad\",\"doi\":\"10.1145/3452021.3458330\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detecting frequent elements is among the oldest and most-studied problems in the area of data streams. Given a stream of m data items in \\\\1, 2, \\\\dots, n\\\\, the objective is to output items that appear at least d times, for some threshold parameter d, and provably optimal algorithms are known today. However, in many applications, knowing only the frequent elements themselves is not enough: For example, an Internet router may not only need to know the most frequent destination IP addresses of forwarded packages, but also the timestamps of when these packages appeared or any other meta-data that \\\"arrived'' with the packages, e.g., their source IP addresses. In this paper, we introduce the witness version of the frequent elements problem: Given a desired approximation guarantee α \\\\ge 1$ and a desired frequency $d łe Δ$, where Δ is the frequency of the most frequent item, the objective is to report an item together with at least $d / α$ timestamps of when the item appeared in the stream (or any other meta-data that arrived with the items). We give provably optimal algorithms for both the insertion-only and insertion-deletion stream settings: In insertion-only streams, we show that space $\\\\tildeO (n + d \\\\cdot n^\\\\frac1 α )$ is necessary and sufficient for every integral $1 łe α łe łog n$. In insertion-deletion streams, we show that space $\\\\tildeO (\\\\fracn \\\\cdot d α^2 )$ is necessary and sufficient, for every α łe \\\\sqrtn $.\",\"PeriodicalId\":405398,\"journal\":{\"name\":\"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3452021.3458330\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3452021.3458330","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

检测频繁元素是数据流领域中最古老和研究最多的问题之一。给定一个包含m个数据项的流,在\ 1,2,\dots, n\中,目标是输出至少出现d次的项,对于某些阈值参数d,并且目前已知的可证明的最优算法。然而,在许多应用程序中,只知道频繁元素本身是不够的:例如,互联网路由器可能不仅需要知道转发包的最频繁的目的地IP地址,还需要知道这些包出现的时间戳或与包一起“到达”的任何其他元数据,例如,它们的源IP地址。在本文中,我们引入了见证版本的频繁元素问题:给定一个期望的近似保证α \ge 1 $ and a desired frequency $ d łe Δ $, where Δ is the frequency of the most frequent item, the objective is to report an item together with at least $ d / α $ timestamps of when the item appeared in the stream (or any other meta-data that arrived with the items). We give provably optimal algorithms for both the insertion-only and insertion-deletion stream settings: In insertion-only streams, we show that space $\tildeO (n + d \cdot n^ \frac 1 α) $ is necessary and sufficient for every integral $ 1 łe α łe łog n $. In insertion-deletion streams, we show that space $\tildeO (\fracn\cdot d α^2) $ is necessary and sufficient, for every α łe \sqrtn $。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Frequent Elements with Witnesses in Data Streams
Detecting frequent elements is among the oldest and most-studied problems in the area of data streams. Given a stream of m data items in \1, 2, \dots, n\, the objective is to output items that appear at least d times, for some threshold parameter d, and provably optimal algorithms are known today. However, in many applications, knowing only the frequent elements themselves is not enough: For example, an Internet router may not only need to know the most frequent destination IP addresses of forwarded packages, but also the timestamps of when these packages appeared or any other meta-data that "arrived'' with the packages, e.g., their source IP addresses. In this paper, we introduce the witness version of the frequent elements problem: Given a desired approximation guarantee α \ge 1$ and a desired frequency $d łe Δ$, where Δ is the frequency of the most frequent item, the objective is to report an item together with at least $d / α$ timestamps of when the item appeared in the stream (or any other meta-data that arrived with the items). We give provably optimal algorithms for both the insertion-only and insertion-deletion stream settings: In insertion-only streams, we show that space $\tildeO (n + d \cdot n^\frac1 α )$ is necessary and sufficient for every integral $1 łe α łe łog n$. In insertion-deletion streams, we show that space $\tildeO (\fracn \cdot d α^2 )$ is necessary and sufficient, for every α łe \sqrtn $.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信