针对事件流处理的时间衰减窗口的近似成员查询

Yang Liu, Wenji Chen, Y. Guan
{"title":"针对事件流处理的时间衰减窗口的近似成员查询","authors":"Yang Liu, Wenji Chen, Y. Guan","doi":"10.1145/2335484.2335489","DOIUrl":null,"url":null,"abstract":"There has been a long history of finding a space-efficient data structure to support approximate membership queries, started from Bloom's work in the 1970's. Given a set <i>A</i> of <i>n</i> items and an additional item <i>x</i> from the same universe <i>u</i> of a size <i>m</i> ≫ <i>n</i>, we want to distinguish whether <i>x</i> ∈ <i>A</i> or not, using small (limited) space. If <i>A</i> is static, there exist optimal algorithms to find a randomized data structure to represent <i>A</i> using only (1 + <i>o</i>(1))<i>n</i> log 1/Δ bits, which only allows for a small false positive Δ but no false negative. However, existing optimal algorithms are not practical for many event-based systems, e. g., web services, peer-to-peer systems, network traffic monitoring, etc. In these systems, items are inserted or updated dynamically in a stream of events, and we are interested in recently updated items. In this paper, we propose a novel data structure to support approximate membership queries in a time-decaying window model. In this model, items are inserted one-by-one over a data stream, and we want to determine whether an item is among the most recent <i>w</i> items for any given window size <i>w</i> ≤ <i>n</i>. Our data structure only requires <i>O</i>(<i>n</i>(log 1/Δ + log <i>n</i>)) bits and <i>O</i>(1) running time.","PeriodicalId":92123,"journal":{"name":"Proceedings of the ... International Workshop on Distributed Event-Based Systems. International Workshop on Distributed Event-Based Systems","volume":"58 1","pages":"44-47"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Approximate membership query over time-decaying windows for event stream processing\",\"authors\":\"Yang Liu, Wenji Chen, Y. Guan\",\"doi\":\"10.1145/2335484.2335489\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There has been a long history of finding a space-efficient data structure to support approximate membership queries, started from Bloom's work in the 1970's. Given a set <i>A</i> of <i>n</i> items and an additional item <i>x</i> from the same universe <i>u</i> of a size <i>m</i> ≫ <i>n</i>, we want to distinguish whether <i>x</i> ∈ <i>A</i> or not, using small (limited) space. If <i>A</i> is static, there exist optimal algorithms to find a randomized data structure to represent <i>A</i> using only (1 + <i>o</i>(1))<i>n</i> log 1/Δ bits, which only allows for a small false positive Δ but no false negative. However, existing optimal algorithms are not practical for many event-based systems, e. g., web services, peer-to-peer systems, network traffic monitoring, etc. In these systems, items are inserted or updated dynamically in a stream of events, and we are interested in recently updated items. In this paper, we propose a novel data structure to support approximate membership queries in a time-decaying window model. In this model, items are inserted one-by-one over a data stream, and we want to determine whether an item is among the most recent <i>w</i> items for any given window size <i>w</i> ≤ <i>n</i>. Our data structure only requires <i>O</i>(<i>n</i>(log 1/Δ + log <i>n</i>)) bits and <i>O</i>(1) running time.\",\"PeriodicalId\":92123,\"journal\":{\"name\":\"Proceedings of the ... International Workshop on Distributed Event-Based Systems. International Workshop on Distributed Event-Based Systems\",\"volume\":\"58 1\",\"pages\":\"44-47\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... International Workshop on Distributed Event-Based Systems. International Workshop on Distributed Event-Based Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2335484.2335489\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Workshop on Distributed Event-Based Systems. International Workshop on Distributed Event-Based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2335484.2335489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

从20世纪70年代Bloom的工作开始,人们一直在寻找一种空间高效的数据结构来支持近似成员查询。给定一个包含n个项目的集合a和一个附加项目x,它们来自大小为m比n的同一个宇宙u,我们想用很小的(有限的)空间来区分x是否∈a。如果A是静态的,存在最优算法来找到一个随机的数据结构来表示A,只使用(1 + o(1))n log 1/Δ位,这只允许一个小的假阳性Δ,但没有假阴性。然而,现有的最优算法并不适用于许多基于事件的系统,例如web服务、点对点系统、网络流量监控等。在这些系统中,项目在事件流中被动态插入或更新,我们对最近更新的项目感兴趣。在本文中,我们提出了一种新的数据结构来支持时间衰减窗口模型中的近似隶属度查询。在这个模型中,项在数据流中一个接一个地插入,我们想要确定一个项是否在任何给定窗口大小w≤n的最近w项中。我们的数据结构只需要O(n(log 1/Δ + log n))位和O(1)运行时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Approximate membership query over time-decaying windows for event stream processing
There has been a long history of finding a space-efficient data structure to support approximate membership queries, started from Bloom's work in the 1970's. Given a set A of n items and an additional item x from the same universe u of a size mn, we want to distinguish whether xA or not, using small (limited) space. If A is static, there exist optimal algorithms to find a randomized data structure to represent A using only (1 + o(1))n log 1/Δ bits, which only allows for a small false positive Δ but no false negative. However, existing optimal algorithms are not practical for many event-based systems, e. g., web services, peer-to-peer systems, network traffic monitoring, etc. In these systems, items are inserted or updated dynamically in a stream of events, and we are interested in recently updated items. In this paper, we propose a novel data structure to support approximate membership queries in a time-decaying window model. In this model, items are inserted one-by-one over a data stream, and we want to determine whether an item is among the most recent w items for any given window size wn. Our data structure only requires O(n(log 1/Δ + log n)) bits and O(1) running time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信