{"title":"A Tight Lower Bound for Comparison-Based Quantile Summaries","authors":"Graham Cormode, P. Veselý","doi":"10.1145/3375395.3387650","DOIUrl":null,"url":null,"abstract":"Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles of a stream of items, up to an error of at most ε. That is, an ε-approximate quantile summary first processes a stream and then, given any quantile query 0łe φłe 1, returns an item from the stream, which is a φ'-quantile for some φ' = φ +- ε. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna [6], stores at most O(1/ε ⋅ log ε N) items, where N is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space f(ε)⋅ o(log N), for any function f that does not depend on N. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of (1+-ε)⋅ φ, and for other related computational tasks.","PeriodicalId":412441,"journal":{"name":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3375395.3387650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
Quantiles, such as the median or percentiles, provide concise and useful information about the distribution of a collection of items, drawn from a totally ordered universe. We study data structures, called quantile summaries, which keep track of all quantiles of a stream of items, up to an error of at most ε. That is, an ε-approximate quantile summary first processes a stream and then, given any quantile query 0łe φłe 1, returns an item from the stream, which is a φ'-quantile for some φ' = φ +- ε. We focus on comparison-based quantile summaries that can only compare two items and are otherwise completely oblivious of the universe. The best such deterministic quantile summary to date, due to Greenwald and Khanna [6], stores at most O(1/ε ⋅ log ε N) items, where N is the number of items in the stream. We prove that this space bound is optimal by showing a matching lower bound. Our result thus rules out the possibility of constructing a deterministic comparison-based quantile summary in space f(ε)⋅ o(log N), for any function f that does not depend on N. As a corollary, we improve the lower bound for biased quantiles, which provide a stronger, relative-error guarantee of (1+-ε)⋅ φ, and for other related computational tasks.