Victor Jarlow, Charalampos Stylianopoulos, Marina Papatriantafilou
{"title":"QPOPSS:优化查询和并行性,为查找频繁流元素节省空间","authors":"Victor Jarlow, Charalampos Stylianopoulos, Marina Papatriantafilou","doi":"arxiv-2409.01749","DOIUrl":null,"url":null,"abstract":"The frequent elements problem, a key component in demanding stream-data\nanalytics, involves selecting elements whose occurrence exceeds a\nuser-specified threshold. Fast, memory-efficient $\\epsilon$-approximate\nsynopsis algorithms select all frequent elements but may overestimate them\ndepending on $\\epsilon$ (user-defined parameter). Evolving applications demand\nperformance only achievable by parallelization. However, algorithmic guarantees\nconcerning concurrent updates and queries have been overlooked. We propose\nQuery and Parallelism Optimized Space-Saving (QPOPSS), providing concurrency\nguarantees. The design includes an implementation of the \\emph{Space-Saving}\nalgorithm supporting fast queries, implying minimal overlap with concurrent\nupdates. QPOPSS integrates this with the distribution of work and fine-grained\nsynchronization among threads, swiftly balancing high throughput, high\naccuracy, and low memory consumption. Our analysis, under various concurrency\nand data distribution conditions, shows space and approximation bounds. Our\nempirical evaluation relative to representative state-of-the-art methods\nreveals that QPOPSS's multi-threaded throughput scales linearly while\nmaintaining the highest accuracy, with orders of magnitude smaller memory\nfootprint.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"QPOPSS: Query and Parallelism Optimized Space-Saving for Finding Frequent Stream Elements\",\"authors\":\"Victor Jarlow, Charalampos Stylianopoulos, Marina Papatriantafilou\",\"doi\":\"arxiv-2409.01749\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The frequent elements problem, a key component in demanding stream-data\\nanalytics, involves selecting elements whose occurrence exceeds a\\nuser-specified threshold. Fast, memory-efficient $\\\\epsilon$-approximate\\nsynopsis algorithms select all frequent elements but may overestimate them\\ndepending on $\\\\epsilon$ (user-defined parameter). Evolving applications demand\\nperformance only achievable by parallelization. However, algorithmic guarantees\\nconcerning concurrent updates and queries have been overlooked. We propose\\nQuery and Parallelism Optimized Space-Saving (QPOPSS), providing concurrency\\nguarantees. The design includes an implementation of the \\\\emph{Space-Saving}\\nalgorithm supporting fast queries, implying minimal overlap with concurrent\\nupdates. QPOPSS integrates this with the distribution of work and fine-grained\\nsynchronization among threads, swiftly balancing high throughput, high\\naccuracy, and low memory consumption. Our analysis, under various concurrency\\nand data distribution conditions, shows space and approximation bounds. Our\\nempirical evaluation relative to representative state-of-the-art methods\\nreveals that QPOPSS's multi-threaded throughput scales linearly while\\nmaintaining the highest accuracy, with orders of magnitude smaller memory\\nfootprint.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"58 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.01749\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
QPOPSS: Query and Parallelism Optimized Space-Saving for Finding Frequent Stream Elements
The frequent elements problem, a key component in demanding stream-data
analytics, involves selecting elements whose occurrence exceeds a
user-specified threshold. Fast, memory-efficient $\epsilon$-approximate
synopsis algorithms select all frequent elements but may overestimate them
depending on $\epsilon$ (user-defined parameter). Evolving applications demand
performance only achievable by parallelization. However, algorithmic guarantees
concerning concurrent updates and queries have been overlooked. We propose
Query and Parallelism Optimized Space-Saving (QPOPSS), providing concurrency
guarantees. The design includes an implementation of the \emph{Space-Saving}
algorithm supporting fast queries, implying minimal overlap with concurrent
updates. QPOPSS integrates this with the distribution of work and fine-grained
synchronization among threads, swiftly balancing high throughput, high
accuracy, and low memory consumption. Our analysis, under various concurrency
and data distribution conditions, shows space and approximation bounds. Our
empirical evaluation relative to representative state-of-the-art methods
reveals that QPOPSS's multi-threaded throughput scales linearly while
maintaining the highest accuracy, with orders of magnitude smaller memory
footprint.