QPOPSS:优化查询和并行性,为查找频繁流元素节省空间

Victor Jarlow, Charalampos Stylianopoulos, Marina Papatriantafilou
{"title":"QPOPSS:优化查询和并行性,为查找频繁流元素节省空间","authors":"Victor Jarlow, Charalampos Stylianopoulos, Marina Papatriantafilou","doi":"arxiv-2409.01749","DOIUrl":null,"url":null,"abstract":"The frequent elements problem, a key component in demanding stream-data\nanalytics, involves selecting elements whose occurrence exceeds a\nuser-specified threshold. Fast, memory-efficient $\\epsilon$-approximate\nsynopsis algorithms select all frequent elements but may overestimate them\ndepending on $\\epsilon$ (user-defined parameter). Evolving applications demand\nperformance only achievable by parallelization. However, algorithmic guarantees\nconcerning concurrent updates and queries have been overlooked. We propose\nQuery and Parallelism Optimized Space-Saving (QPOPSS), providing concurrency\nguarantees. The design includes an implementation of the \\emph{Space-Saving}\nalgorithm supporting fast queries, implying minimal overlap with concurrent\nupdates. QPOPSS integrates this with the distribution of work and fine-grained\nsynchronization among threads, swiftly balancing high throughput, high\naccuracy, and low memory consumption. Our analysis, under various concurrency\nand data distribution conditions, shows space and approximation bounds. Our\nempirical evaluation relative to representative state-of-the-art methods\nreveals that QPOPSS's multi-threaded throughput scales linearly while\nmaintaining the highest accuracy, with orders of magnitude smaller memory\nfootprint.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"58 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"QPOPSS: Query and Parallelism Optimized Space-Saving for Finding Frequent Stream Elements\",\"authors\":\"Victor Jarlow, Charalampos Stylianopoulos, Marina Papatriantafilou\",\"doi\":\"arxiv-2409.01749\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The frequent elements problem, a key component in demanding stream-data\\nanalytics, involves selecting elements whose occurrence exceeds a\\nuser-specified threshold. Fast, memory-efficient $\\\\epsilon$-approximate\\nsynopsis algorithms select all frequent elements but may overestimate them\\ndepending on $\\\\epsilon$ (user-defined parameter). Evolving applications demand\\nperformance only achievable by parallelization. However, algorithmic guarantees\\nconcerning concurrent updates and queries have been overlooked. We propose\\nQuery and Parallelism Optimized Space-Saving (QPOPSS), providing concurrency\\nguarantees. The design includes an implementation of the \\\\emph{Space-Saving}\\nalgorithm supporting fast queries, implying minimal overlap with concurrent\\nupdates. QPOPSS integrates this with the distribution of work and fine-grained\\nsynchronization among threads, swiftly balancing high throughput, high\\naccuracy, and low memory consumption. Our analysis, under various concurrency\\nand data distribution conditions, shows space and approximation bounds. Our\\nempirical evaluation relative to representative state-of-the-art methods\\nreveals that QPOPSS's multi-threaded throughput scales linearly while\\nmaintaining the highest accuracy, with orders of magnitude smaller memory\\nfootprint.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"58 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.01749\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.01749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

频繁元素问题是要求苛刻的流数据分析中的一个关键组成部分,涉及选择出现率超过用户指定阈值的元素。快速、内存效率高的 $/epsilon$-近似提要算法会选择所有频繁元素,但可能会高估它们,这取决于 $/epsilon$(用户定义的参数)。不断发展的应用对性能的要求只能通过并行化来实现。然而,有关并发更新和查询的算法保证一直被忽视。我们提出了查询和并行优化节省空间(QPOPSS),提供并发保证。该设计包括支持快速查询的emph{Space-Saving}算法的实现,这意味着与并发更新的重叠最小。QPOPSS 将其与线程间的工作分配和细粒度同步整合在一起,迅速平衡了高吞吐量、高精确度和低内存消耗。我们在各种并发和数据分布条件下进行的分析表明了空间和近似边界。与最先进的代表性方法相比,我们的实证评估结果表明,QPOPSS 的多线程吞吐量呈线性扩展,同时保持了最高精度,内存占用却小了几个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
QPOPSS: Query and Parallelism Optimized Space-Saving for Finding Frequent Stream Elements
The frequent elements problem, a key component in demanding stream-data analytics, involves selecting elements whose occurrence exceeds a user-specified threshold. Fast, memory-efficient $\epsilon$-approximate synopsis algorithms select all frequent elements but may overestimate them depending on $\epsilon$ (user-defined parameter). Evolving applications demand performance only achievable by parallelization. However, algorithmic guarantees concerning concurrent updates and queries have been overlooked. We propose Query and Parallelism Optimized Space-Saving (QPOPSS), providing concurrency guarantees. The design includes an implementation of the \emph{Space-Saving} algorithm supporting fast queries, implying minimal overlap with concurrent updates. QPOPSS integrates this with the distribution of work and fine-grained synchronization among threads, swiftly balancing high throughput, high accuracy, and low memory consumption. Our analysis, under various concurrency and data distribution conditions, shows space and approximation bounds. Our empirical evaluation relative to representative state-of-the-art methods reveals that QPOPSS's multi-threaded throughput scales linearly while maintaining the highest accuracy, with orders of magnitude smaller memory footprint.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信