Maximum Coverage in Sublinear Space, Faster

Stephen Jaud, Anthony Wirth, F. Choudhury
{"title":"Maximum Coverage in Sublinear Space, Faster","authors":"Stephen Jaud, Anthony Wirth, F. Choudhury","doi":"10.48550/arXiv.2302.06137","DOIUrl":null,"url":null,"abstract":"Given a collection of $m$ sets from a universe $\\mathcal{U}$, the Maximum Set Coverage problem consists of finding $k$ sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor $1-1/e$. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe $|\\mathcal{U}|$. However, one randomized streaming algorithm has been shown to produce a $1-1/e-\\varepsilon$ approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to $m$ and $|\\mathcal{U}|$. In order to achieve such a low space complexity, the authors used a technique called subsampling, based on independent-wise hash functions. This article focuses on this sublinear-space algorithm and introduces methods to reduce the time cost of subsampling. We first show how to accelerate by several orders of magnitude without altering the space complexity, number of passes and approximation quality of the original algorithm. Secondly, we derive a new lower bound for the probability of producing a $1-1/e-\\varepsilon$ approximation using only pairwise independence: $1-\\tfrac{4}{c k \\log m}$ compared to the original $1-\\tfrac{2e}{m^{ck/6}}$. Although the theoretical approximation guarantees are weaker, for large streams, our algorithm performs well in practice and present the best time-space-performance trade-off for maximum coverage in streams.","PeriodicalId":9448,"journal":{"name":"Bulletin of the Society of Sea Water Science, Japan","volume":"37 1","pages":"21:1-21:20"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of the Society of Sea Water Science, Japan","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2302.06137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Given a collection of $m$ sets from a universe $\mathcal{U}$, the Maximum Set Coverage problem consists of finding $k$ sets whose union has largest cardinality. This problem is NP-Hard, but the solution can be approximated by a polynomial time algorithm up to a factor $1-1/e$. However, this algorithm does not scale well with the input size. In a streaming context, practical high-quality solutions are found, but with space complexity that scales linearly with respect to the size of the universe $|\mathcal{U}|$. However, one randomized streaming algorithm has been shown to produce a $1-1/e-\varepsilon$ approximation of the optimal solution with a space complexity that scales only poly-logarithmically with respect to $m$ and $|\mathcal{U}|$. In order to achieve such a low space complexity, the authors used a technique called subsampling, based on independent-wise hash functions. This article focuses on this sublinear-space algorithm and introduces methods to reduce the time cost of subsampling. We first show how to accelerate by several orders of magnitude without altering the space complexity, number of passes and approximation quality of the original algorithm. Secondly, we derive a new lower bound for the probability of producing a $1-1/e-\varepsilon$ approximation using only pairwise independence: $1-\tfrac{4}{c k \log m}$ compared to the original $1-\tfrac{2e}{m^{ck/6}}$. Although the theoretical approximation guarantees are weaker, for large streams, our algorithm performs well in practice and present the best time-space-performance trade-off for maximum coverage in streams.
最大覆盖亚线性空间,更快
给定一个宇宙$\mathcal{U}$中$m$个集合的集合,最大集合覆盖问题包括找到其并集具有最大基数的$k$个集合。这个问题是np困难的,但是解决方案可以用多项式时间算法近似到一个因子$1-1/e$。然而,该算法不能很好地随输入大小进行伸缩。在流环境中,找到了实用的高质量解决方案,但具有相对于宇宙大小线性扩展的空间复杂性$|\mathcal{U}|$。然而,一种随机流算法已被证明可以产生最优解的$1-1/e-\varepsilon$近似,其空间复杂度仅相对于$m$和$|\mathcal{U}|$进行多对数缩放。为了实现如此低的空间复杂度,作者使用了一种基于独立哈希函数的称为子采样的技术。本文重点研究了这种次线性空间算法,并介绍了降低次采样时间成本的方法。我们首先展示了如何在不改变原始算法的空间复杂度、通过次数和近似质量的情况下加速几个数量级。其次,我们推导出仅使用成对独立产生$1-1/e-\varepsilon$近似的概率的新下界:$1-\tfrac{4}{c k \log m}$与原始的$1-\tfrac{2e}{m^{ck/6}}$相比。虽然理论上的近似保证较弱,但对于大型流,我们的算法在实践中表现良好,并且在流的最大覆盖方面提供了最佳的时间-空间性能权衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信