{"title":"A Provably Correct Sampler for Probabilistic Programs","authors":"C. Hur, A. Nori, S. Rajamani, Selva Samuel","doi":"10.4230/LIPIcs.FSTTCS.2015.475","DOIUrl":null,"url":null,"abstract":"We consider the problem of inferring the implicit distribution specified by a probabilistic program. A popular inference technique for probabilistic programs called Markov Chain Monte Carlo or \nMCMC sampling involves running the program repeatedly and generating sample values by perturbing values produced in \"previous runs\". This simulates a Markov chain whose stationary distribution is the distribution specified by the probabilistic program. However, it is non-trivial to implement MCMC sampling for probabilistic programs since each variable could be updated at multiple program points. In such cases, it is unclear which values from the \"previous run\" should be used to generate samples for the \"current run\". We present an algorithm to solve this problem for the general case and formally prove that the algorithm is correct. Our algorithm handles variables that are updated multiple times along the same path, updated along different paths in a conditional statement, or repeatedly updated inside loops, We have implemented our algorithm in a tool called InferX. We empirically demonstrate that InferX produces the correct result for various benchmarks, whereas existing tools such as R2 and Stan produce incorrect results on several of these benchmarks.","PeriodicalId":175000,"journal":{"name":"Foundations of Software Technology and Theoretical Computer Science","volume":"79 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations of Software Technology and Theoretical Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.FSTTCS.2015.475","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33
Abstract
We consider the problem of inferring the implicit distribution specified by a probabilistic program. A popular inference technique for probabilistic programs called Markov Chain Monte Carlo or
MCMC sampling involves running the program repeatedly and generating sample values by perturbing values produced in "previous runs". This simulates a Markov chain whose stationary distribution is the distribution specified by the probabilistic program. However, it is non-trivial to implement MCMC sampling for probabilistic programs since each variable could be updated at multiple program points. In such cases, it is unclear which values from the "previous run" should be used to generate samples for the "current run". We present an algorithm to solve this problem for the general case and formally prove that the algorithm is correct. Our algorithm handles variables that are updated multiple times along the same path, updated along different paths in a conditional statement, or repeatedly updated inside loops, We have implemented our algorithm in a tool called InferX. We empirically demonstrate that InferX produces the correct result for various benchmarks, whereas existing tools such as R2 and Stan produce incorrect results on several of these benchmarks.
我们考虑由概率规划指定的隐式分布的推断问题。一种被称为马尔科夫链蒙特卡罗(Markov Chain Monte Carlo)或MCMC抽样的概率程序的流行推断技术涉及重复运行程序,并通过干扰“以前运行”中产生的值来生成样本值。这模拟了一个平稳分布是由概率程序指定的分布的马尔可夫链。然而,为概率程序实现MCMC抽样并非易事,因为每个变量都可以在多个程序点上更新。在这种情况下,不清楚应该使用“前一次运行”中的哪些值来为“当前运行”生成样本。我们提出了一种算法来解决一般情况下的这一问题,并正式证明了该算法的正确性。我们的算法处理沿着相同路径多次更新的变量,沿着条件语句中的不同路径更新的变量,或者在循环中重复更新的变量。我们已经在一个名为InferX的工具中实现了我们的算法。我们的经验证明,在各种基准测试中,InferX产生了正确的结果,而现有的工具,如R2和Stan,在其中一些基准测试中产生了不正确的结果。