{"title":"针对事件流处理的时间衰减窗口的近似成员查询","authors":"Yang Liu, Wenji Chen, Y. Guan","doi":"10.1145/2335484.2335489","DOIUrl":null,"url":null,"abstract":"There has been a long history of finding a space-efficient data structure to support approximate membership queries, started from Bloom's work in the 1970's. Given a set <i>A</i> of <i>n</i> items and an additional item <i>x</i> from the same universe <i>u</i> of a size <i>m</i> ≫ <i>n</i>, we want to distinguish whether <i>x</i> ∈ <i>A</i> or not, using small (limited) space. If <i>A</i> is static, there exist optimal algorithms to find a randomized data structure to represent <i>A</i> using only (1 + <i>o</i>(1))<i>n</i> log 1/Δ bits, which only allows for a small false positive Δ but no false negative. However, existing optimal algorithms are not practical for many event-based systems, e. g., web services, peer-to-peer systems, network traffic monitoring, etc. In these systems, items are inserted or updated dynamically in a stream of events, and we are interested in recently updated items. In this paper, we propose a novel data structure to support approximate membership queries in a time-decaying window model. In this model, items are inserted one-by-one over a data stream, and we want to determine whether an item is among the most recent <i>w</i> items for any given window size <i>w</i> ≤ <i>n</i>. Our data structure only requires <i>O</i>(<i>n</i>(log 1/Δ + log <i>n</i>)) bits and <i>O</i>(1) running time.","PeriodicalId":92123,"journal":{"name":"Proceedings of the ... International Workshop on Distributed Event-Based Systems. International Workshop on Distributed Event-Based Systems","volume":"58 1","pages":"44-47"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Approximate membership query over time-decaying windows for event stream processing\",\"authors\":\"Yang Liu, Wenji Chen, Y. Guan\",\"doi\":\"10.1145/2335484.2335489\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There has been a long history of finding a space-efficient data structure to support approximate membership queries, started from Bloom's work in the 1970's. Given a set <i>A</i> of <i>n</i> items and an additional item <i>x</i> from the same universe <i>u</i> of a size <i>m</i> ≫ <i>n</i>, we want to distinguish whether <i>x</i> ∈ <i>A</i> or not, using small (limited) space. If <i>A</i> is static, there exist optimal algorithms to find a randomized data structure to represent <i>A</i> using only (1 + <i>o</i>(1))<i>n</i> log 1/Δ bits, which only allows for a small false positive Δ but no false negative. However, existing optimal algorithms are not practical for many event-based systems, e. g., web services, peer-to-peer systems, network traffic monitoring, etc. In these systems, items are inserted or updated dynamically in a stream of events, and we are interested in recently updated items. In this paper, we propose a novel data structure to support approximate membership queries in a time-decaying window model. In this model, items are inserted one-by-one over a data stream, and we want to determine whether an item is among the most recent <i>w</i> items for any given window size <i>w</i> ≤ <i>n</i>. Our data structure only requires <i>O</i>(<i>n</i>(log 1/Δ + log <i>n</i>)) bits and <i>O</i>(1) running time.\",\"PeriodicalId\":92123,\"journal\":{\"name\":\"Proceedings of the ... International Workshop on Distributed Event-Based Systems. International Workshop on Distributed Event-Based Systems\",\"volume\":\"58 1\",\"pages\":\"44-47\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... International Workshop on Distributed Event-Based Systems. International Workshop on Distributed Event-Based Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2335484.2335489\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Workshop on Distributed Event-Based Systems. International Workshop on Distributed Event-Based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2335484.2335489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Approximate membership query over time-decaying windows for event stream processing
There has been a long history of finding a space-efficient data structure to support approximate membership queries, started from Bloom's work in the 1970's. Given a set A of n items and an additional item x from the same universe u of a size m ≫ n, we want to distinguish whether x ∈ A or not, using small (limited) space. If A is static, there exist optimal algorithms to find a randomized data structure to represent A using only (1 + o(1))n log 1/Δ bits, which only allows for a small false positive Δ but no false negative. However, existing optimal algorithms are not practical for many event-based systems, e. g., web services, peer-to-peer systems, network traffic monitoring, etc. In these systems, items are inserted or updated dynamically in a stream of events, and we are interested in recently updated items. In this paper, we propose a novel data structure to support approximate membership queries in a time-decaying window model. In this model, items are inserted one-by-one over a data stream, and we want to determine whether an item is among the most recent w items for any given window size w ≤ n. Our data structure only requires O(n(log 1/Δ + log n)) bits and O(1) running time.