{"title":"Sketching via hashing: from heavy hitters to compressed sensing to sparse fourier transform","authors":"P. Indyk","doi":"10.1145/2463664.2465217","DOIUrl":null,"url":null,"abstract":"Sketching via hashing is a popular and useful method for processing large data sets. Its basic idea is as follows. Suppose that we have a large multi-set of elements S = {a1, . . . as} ⊂ {1 . . . n}, and we would like to identify the elements that occur “frequently” in S. The algorithm starts by selecting a hash function h that maps the elements into an array c[1 . . .m]. The array entries are initialized to 0. Then, for each element a ∈ S, the algorithm increments c[h(a)]. At the end of the process, each array entry c[j] contains the count of all data elements a ∈ S mapped to j. It can be observed that if an element a occurs frequently enough in the data set S, then the value of the counter c[h(a)] must be large. That is, “frequent” elements are mapped to “heavy” buckets. By identifying the elements mapped to heavy buckets and repeating the process several times, one can efficiently recover the frequent elements, possibly together with a few extra ones (false positives).","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"19 7 1","pages":"87-90"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2463664.2465217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Sketching via hashing is a popular and useful method for processing large data sets. Its basic idea is as follows. Suppose that we have a large multi-set of elements S = {a1, . . . as} ⊂ {1 . . . n}, and we would like to identify the elements that occur “frequently” in S. The algorithm starts by selecting a hash function h that maps the elements into an array c[1 . . .m]. The array entries are initialized to 0. Then, for each element a ∈ S, the algorithm increments c[h(a)]. At the end of the process, each array entry c[j] contains the count of all data elements a ∈ S mapped to j. It can be observed that if an element a occurs frequently enough in the data set S, then the value of the counter c[h(a)] must be large. That is, “frequent” elements are mapped to “heavy” buckets. By identifying the elements mapped to heavy buckets and repeating the process several times, one can efficiently recover the frequent elements, possibly together with a few extra ones (false positives).