{"title":"数据流的可变性","authors":"David Felber, R. Ostrovsky","doi":"10.1145/2902251.2902277","DOIUrl":null,"url":null,"abstract":"We consider the problem of tracking with small relative error an integer function f(n) defined by a distributed update stream f'(n) in the distributed monitoring model. In this model, there are k sites over which the updates f'(n) are distributed, and they must communicate with a central coordinator to maintain an estimate of f(n). Existing streaming algorithms with worst-case guarantees for this problem assume f(n) to be monotone; there are very large lower bounds on the space requirements for summarizing a distributed non-monotonic stream, often linear in the size n of the stream. However, the input streams obtaining these lower bounds are highly variable, making relatively large jumps from one timestep to the next; in practice, the impact on f(n) of any single update f'(n) is usually small. What has heretofore been lacking is a framework for non-monotonic streams that admits algorithms whose worst-case performance is as good as existing algorithms for monotone streams and degrades gracefully for non-monotonic streams as those streams vary more quickly. In this paper we propose such a framework. We introduce a stream parameter, the \"variability\" v, deriving its definition in a way that shows it to be a natural parameter to consider for non-monotonic streams. It is also a useful parameter. From a theoretical perspective, we can adapt existing algorithms for monotone streams to work for non-monotonic streams, with only minor modifications, in such a way that they reduce to the monotone case when the stream happens to be monotone, and in such a way that we can refine the worst-case communication bounds from θ(n) to Õv. From a practical perspective, we demonstrate that v can be small in practice by proving that v is O(log f(n)) for monotone streams and o(n) for streams that are \"nearly\" monotone or that are generated by random walks. We expect v to be o(n) for many other interesting input classes as well.","PeriodicalId":158471,"journal":{"name":"Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Variability in Data Streams\",\"authors\":\"David Felber, R. Ostrovsky\",\"doi\":\"10.1145/2902251.2902277\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider the problem of tracking with small relative error an integer function f(n) defined by a distributed update stream f'(n) in the distributed monitoring model. In this model, there are k sites over which the updates f'(n) are distributed, and they must communicate with a central coordinator to maintain an estimate of f(n). Existing streaming algorithms with worst-case guarantees for this problem assume f(n) to be monotone; there are very large lower bounds on the space requirements for summarizing a distributed non-monotonic stream, often linear in the size n of the stream. However, the input streams obtaining these lower bounds are highly variable, making relatively large jumps from one timestep to the next; in practice, the impact on f(n) of any single update f'(n) is usually small. What has heretofore been lacking is a framework for non-monotonic streams that admits algorithms whose worst-case performance is as good as existing algorithms for monotone streams and degrades gracefully for non-monotonic streams as those streams vary more quickly. In this paper we propose such a framework. We introduce a stream parameter, the \\\"variability\\\" v, deriving its definition in a way that shows it to be a natural parameter to consider for non-monotonic streams. It is also a useful parameter. From a theoretical perspective, we can adapt existing algorithms for monotone streams to work for non-monotonic streams, with only minor modifications, in such a way that they reduce to the monotone case when the stream happens to be monotone, and in such a way that we can refine the worst-case communication bounds from θ(n) to Õv. From a practical perspective, we demonstrate that v can be small in practice by proving that v is O(log f(n)) for monotone streams and o(n) for streams that are \\\"nearly\\\" monotone or that are generated by random walks. We expect v to be o(n) for many other interesting input classes as well.\",\"PeriodicalId\":158471,\"journal\":{\"name\":\"Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2902251.2902277\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2902251.2902277","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
We consider the problem of tracking with small relative error an integer function f(n) defined by a distributed update stream f'(n) in the distributed monitoring model. In this model, there are k sites over which the updates f'(n) are distributed, and they must communicate with a central coordinator to maintain an estimate of f(n). Existing streaming algorithms with worst-case guarantees for this problem assume f(n) to be monotone; there are very large lower bounds on the space requirements for summarizing a distributed non-monotonic stream, often linear in the size n of the stream. However, the input streams obtaining these lower bounds are highly variable, making relatively large jumps from one timestep to the next; in practice, the impact on f(n) of any single update f'(n) is usually small. What has heretofore been lacking is a framework for non-monotonic streams that admits algorithms whose worst-case performance is as good as existing algorithms for monotone streams and degrades gracefully for non-monotonic streams as those streams vary more quickly. In this paper we propose such a framework. We introduce a stream parameter, the "variability" v, deriving its definition in a way that shows it to be a natural parameter to consider for non-monotonic streams. It is also a useful parameter. From a theoretical perspective, we can adapt existing algorithms for monotone streams to work for non-monotonic streams, with only minor modifications, in such a way that they reduce to the monotone case when the stream happens to be monotone, and in such a way that we can refine the worst-case communication bounds from θ(n) to Õv. From a practical perspective, we demonstrate that v can be small in practice by proving that v is O(log f(n)) for monotone streams and o(n) for streams that are "nearly" monotone or that are generated by random walks. We expect v to be o(n) for many other interesting input classes as well.