How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions

Tal Herman, Guy Rothblum
{"title":"How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions","authors":"Tal Herman, Guy Rothblum","doi":"arxiv-2409.06594","DOIUrl":null,"url":null,"abstract":"As statistical analyses become more central to science, industry and society,\nthere is a growing need to ensure correctness of their results. Approximate\ncorrectness can be verified by replicating the entire analysis, but can we\nverify without replication? Building on a recent line of work, we study\nproof-systems that allow a probabilistic verifier to ascertain that the results\nof an analysis are approximately correct, while drawing fewer samples and using\nless computational resources than would be needed to replicate the analysis. We\nfocus on distribution testing problems: verifying that an unknown distribution\nis close to having a claimed property. Our main contribution is a interactive protocol between a verifier and an\nuntrusted prover, which can be used to verify any distribution property that\ncan be decided in polynomial time given a full and explicit description of the\ndistribution. If the distribution is at statistical distance $\\varepsilon$ from\nhaving the property, then the verifier rejects with high probability. This\nsoundness property holds against any polynomial-time strategy that a cheating\nprover might follow, assuming the existence of collision-resistant hash\nfunctions (a standard assumption in cryptography). For distributions over a\ndomain of size $N$, the protocol consists of $4$ messages and the communication\ncomplexity and verifier runtime are roughly $\\widetilde{O}\\left(\\sqrt{N} /\n\\varepsilon^2 \\right)$. The verifier's sample complexity is\n$\\widetilde{O}\\left(\\sqrt{N} / \\varepsilon^2 \\right)$, and this is optimal up\nto $\\polylog(N)$ factors (for any protocol, regardless of its communication\ncomplexity). Even for simple properties, approximately deciding whether an\nunknown distribution has the property can require quasi-linear sample\ncomplexity and running time. For any such property, our protocol provides a\nquadratic speedup over replicating the analysis.","PeriodicalId":501332,"journal":{"name":"arXiv - CS - Cryptography and Security","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Cryptography and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06594","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As statistical analyses become more central to science, industry and society, there is a growing need to ensure correctness of their results. Approximate correctness can be verified by replicating the entire analysis, but can we verify without replication? Building on a recent line of work, we study proof-systems that allow a probabilistic verifier to ascertain that the results of an analysis are approximately correct, while drawing fewer samples and using less computational resources than would be needed to replicate the analysis. We focus on distribution testing problems: verifying that an unknown distribution is close to having a claimed property. Our main contribution is a interactive protocol between a verifier and an untrusted prover, which can be used to verify any distribution property that can be decided in polynomial time given a full and explicit description of the distribution. If the distribution is at statistical distance $\varepsilon$ from having the property, then the verifier rejects with high probability. This soundness property holds against any polynomial-time strategy that a cheating prover might follow, assuming the existence of collision-resistant hash functions (a standard assumption in cryptography). For distributions over a domain of size $N$, the protocol consists of $4$ messages and the communication complexity and verifier runtime are roughly $\widetilde{O}\left(\sqrt{N} / \varepsilon^2 \right)$. The verifier's sample complexity is $\widetilde{O}\left(\sqrt{N} / \varepsilon^2 \right)$, and this is optimal up to $\polylog(N)$ factors (for any protocol, regardless of its communication complexity). Even for simple properties, approximately deciding whether an unknown distribution has the property can require quasi-linear sample complexity and running time. For any such property, our protocol provides a quadratic speedup over replicating the analysis.
如何验证任何(合理的)分布属性:计算合理的分布论证系统
随着统计分析在科学、工业和社会中变得越来越重要,人们越来越需要确保其结果的正确性。近似正确性可以通过复制整个分析来验证,但我们能在不复制的情况下验证吗?在最近的工作基础上,我们研究了允许概率验证者确定分析结果近似正确的验证系统,同时比复制分析所需的样本和计算资源更少。我们的重点是分布测试问题:验证未知分布是否接近所宣称的属性。我们的主要贡献在于验证者与不受信任的证明者之间的交互协议,该协议可用于验证任何分布属性,只要给定对分布的完整而明确的描述,就能在多项式时间内确定分布属性。如果分布与该属性的统计距离为 $\varepsilon$ ,那么验证者就会高概率地拒绝验证。假设存在抗碰撞的哈希函数(密码学中的标准假设),那么这个健全性就能抵御作弊者可能采取的任何多项式时间策略。对于大小为 $N$ 的域上分布,协议由 $4$ 消息组成,通信复杂度和验证者运行时间大致为 $\widetilde{O}\left(\sqrt{N} /\varepsilon^2 \right)$。验证者的采样复杂度为$widetilde{O}\left(\sqrt{N} / \varepsilon^2 \right)$,而且这是最优的,最高可达$polylog(N)$因子(对于任何协议,无论其通信复杂度如何)。即使是简单的属性,近似判断未知分布是否具有该属性也需要准线性的样本复杂度和运行时间。对于任何此类属性,我们的协议都能比复制分析提供无量级的速度提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信