通过平均情况下的熵独立性实现生成树和确定性点过程的最优次线性采样

IF 1.6 3区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

SIAM Journal on Computing Pub Date : 2024-09-09 DOI:10.1137/22m1524321

Nima Anari, Yang P. Liu, Thuy-Duong Vuong

{"title":"通过平均情况下的熵独立性实现生成树和确定性点过程的最优次线性采样","authors":"Nima Anari, Yang P. Liu, Thuy-Duong Vuong","doi":"10.1137/22m1524321","DOIUrl":null,"url":null,"abstract":"SIAM Journal on Computing, Ahead of Print. <br/> Abstract. We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include as special cases random spanning tree distributions and determinantal point processes. For a graph [math], we show how to approximately sample uniformly random spanning trees from [math] in [math] (Throughout, [math] hides polylogarithmic factors in [math].) time per sample after an initial [math] time preprocessing. This is the first nearly linear runtime in the output size, which is clearly optimal. For a determinantal point process on [math]-sized subsets of a ground set of [math] elements, defined via an [math] kernel matrix, we show how to approximately sample in [math] time after an initial [math] time preprocessing, where [math] is the matrix multiplication exponent. The time to compute just the weight of the output set is simply [math], a natural barrier that suggests our runtime might be optimal for determinantal point processes as well. As a corollary, we even improve the state of the art for obtaining a single sample from a determinantal point process, from the prior runtime of [math] to [math]. In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution [math] on [math] is reduced to sampling from related distributions on [math] for [math]. We show that for strongly Rayleigh distributions, the domain size can be reduced to nearly linear in the output size [math], improving the state of the art from [math] for general strongly Rayleigh distributions and the more specialized [math] for spanning tree distributions. Our reduction involves sampling from [math] domain-sparsified distributions, all of which can be produced efficiently assuming approximate overestimates for marginals of [math] are known and stored in a convenient data structure. Having access to marginals is the discrete analogue of having access to the mean and covariance of a continuous distribution, or equivalently knowing “isotropy” for the distribution, the key behind optimal samplers in the continuous setting based on the famous Kannan–Lovász–Simonovits (KLS) conjecture. We view our result as analogous in spirit to the KLS conjecture and its consequences for sampling, but rather for discrete strongly Rayleigh measures.","PeriodicalId":49532,"journal":{"name":"SIAM Journal on Computing","volume":"673 1","pages":""},"PeriodicalIF":1.6000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence\",\"authors\":\"Nima Anari, Yang P. Liu, Thuy-Duong Vuong\",\"doi\":\"10.1137/22m1524321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SIAM Journal on Computing, Ahead of Print. <br/> Abstract. We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include as special cases random spanning tree distributions and determinantal point processes. For a graph [math], we show how to approximately sample uniformly random spanning trees from [math] in [math] (Throughout, [math] hides polylogarithmic factors in [math].) time per sample after an initial [math] time preprocessing. This is the first nearly linear runtime in the output size, which is clearly optimal. For a determinantal point process on [math]-sized subsets of a ground set of [math] elements, defined via an [math] kernel matrix, we show how to approximately sample in [math] time after an initial [math] time preprocessing, where [math] is the matrix multiplication exponent. The time to compute just the weight of the output set is simply [math], a natural barrier that suggests our runtime might be optimal for determinantal point processes as well. As a corollary, we even improve the state of the art for obtaining a single sample from a determinantal point process, from the prior runtime of [math] to [math]. In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution [math] on [math] is reduced to sampling from related distributions on [math] for [math]. We show that for strongly Rayleigh distributions, the domain size can be reduced to nearly linear in the output size [math], improving the state of the art from [math] for general strongly Rayleigh distributions and the more specialized [math] for spanning tree distributions. Our reduction involves sampling from [math] domain-sparsified distributions, all of which can be produced efficiently assuming approximate overestimates for marginals of [math] are known and stored in a convenient data structure. Having access to marginals is the discrete analogue of having access to the mean and covariance of a continuous distribution, or equivalently knowing “isotropy” for the distribution, the key behind optimal samplers in the continuous setting based on the famous Kannan–Lovász–Simonovits (KLS) conjecture. We view our result as analogous in spirit to the KLS conjecture and its consequences for sampling, but rather for discrete strongly Rayleigh measures.\",\"PeriodicalId\":49532,\"journal\":{\"name\":\"SIAM Journal on Computing\",\"volume\":\"673 1\",\"pages\":\"\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIAM Journal on Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1137/22m1524321\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM Journal on Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1137/22m1524321","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

SIAM 计算期刊》，提前印刷。摘要我们设计了从强雷利分布中重复采样的快速算法，强雷利分布的特例包括随机生成树分布和行列式点过程。对于一个图 [math]，我们展示了如何在初始[math]时间的预处理之后，在[math]时间内近似地从[math]中采样均匀随机生成树（纵观[math]，[math]在[math]中隐藏了多对数因子）。这是在输出大小上的第一个近乎线性的运行时间，显然是最优的。对于通过[math]核矩阵定义的[math]元素地面集的[math]大小子集上的行列式点过程，我们展示了如何在初始[math]时间预处理后，在[math]时间内近似采样，其中[math]是矩阵乘法指数。仅计算输出集合权重的时间就只需 [math]，这一天然屏障表明我们的运行时间对于行列式点过程也可能是最佳的。作为推论，我们甚至改善了从行列式点过程中获取单个样本的技术水平，从之前的运行时间 [math] 降至 [math]。在我们的主要技术成果中，我们实现了强瑞利分布的最优域稀疏化极限。在域稀疏化中，从[math]上的[math]分布采样，简化为从[math]上的[math][math]相关分布采样。我们的研究表明，对于强瑞利分布，域的大小可以减小到与输出大小[math]几乎成线性关系，从而改进了针对一般强瑞利分布的[math]和针对生成树分布的更专业的[math]的技术水平。我们的缩减涉及从[math]域稀疏化分布中采样，假设已知[math]边际的近似高估值，并将其存储在一个方便的数据结构中，那么所有这些分布都可以高效地产生。获取边际值是获取连续分布的均值和协方差的离散类比，或者等同于知道分布的 "各向同性"，这是基于著名的 Kannan-Lovász-Simonovits （KLS）猜想的连续环境中最优采样器背后的关键。我们认为我们的结果在精神上类似于 KLS 猜想及其对采样的影响，但更适用于离散强瑞利度量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence

SIAM Journal on Computing, Ahead of Print.
Abstract. We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include as special cases random spanning tree distributions and determinantal point processes. For a graph [math], we show how to approximately sample uniformly random spanning trees from [math] in [math] (Throughout, [math] hides polylogarithmic factors in [math].) time per sample after an initial [math] time preprocessing. This is the first nearly linear runtime in the output size, which is clearly optimal. For a determinantal point process on [math]-sized subsets of a ground set of [math] elements, defined via an [math] kernel matrix, we show how to approximately sample in [math] time after an initial [math] time preprocessing, where [math] is the matrix multiplication exponent. The time to compute just the weight of the output set is simply [math], a natural barrier that suggests our runtime might be optimal for determinantal point processes as well. As a corollary, we even improve the state of the art for obtaining a single sample from a determinantal point process, from the prior runtime of [math] to [math]. In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution [math] on [math] is reduced to sampling from related distributions on [math] for [math]. We show that for strongly Rayleigh distributions, the domain size can be reduced to nearly linear in the output size [math], improving the state of the art from [math] for general strongly Rayleigh distributions and the more specialized [math] for spanning tree distributions. Our reduction involves sampling from [math] domain-sparsified distributions, all of which can be produced efficiently assuming approximate overestimates for marginals of [math] are known and stored in a convenient data structure. Having access to marginals is the discrete analogue of having access to the mean and covariance of a continuous distribution, or equivalently knowing “isotropy” for the distribution, the key behind optimal samplers in the continuous setting based on the famous Kannan–Lovász–Simonovits (KLS) conjecture. We view our result as analogous in spirit to the KLS conjecture and its consequences for sampling, but rather for discrete strongly Rayleigh measures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SIAM Journal on Computing 工程技术-计算机：理论方法

CiteScore

4.60

自引率

0.00%

发文量

审稿时长

6-12 weeks

期刊介绍： The SIAM Journal on Computing aims to provide coverage of the most significant work going on in the mathematical and formal aspects of computer science and nonnumerical computing. Submissions must be clearly written and make a significant technical contribution. Topics include but are not limited to analysis and design of algorithms, algorithmic game theory, data structures, computational complexity, computational algebra, computational aspects of combinatorics and graph theory, computational biology, computational geometry, computational robotics, the mathematical aspects of programming languages, artificial intelligence, computational learning, databases, information retrieval, cryptography, networks, distributed computing, parallel algorithms, and computer architecture.