Adaptive MCMC parallelisation in Stan

MODSIM2023, 25th International Congress on Modelling and Simulation. Pub Date : 2023-08-01 DOI:10.36334/modsim.2023.stenborg

T. Stenborg

{"title":"Adaptive MCMC parallelisation in Stan","authors":"T. Stenborg","doi":"10.36334/modsim.2023.stenborg","DOIUrl":null,"url":null,"abstract":": Stan is a probabilistic programming language that uses Markov chain Monte Carlo (MCMC) sampling for Bayesian inference (Carpenter at el.). Stan sampling can be parallelised by running Markov chains m on separate processing cores n , i.e. ≥ 1 chain/core, for Amdahlian speedup (Annis et al.). An extension , introduced here, is adaptive parallelisation. First, prior to planned sampling, performance benchmarking was dynamically performed with m = 4… M chains distributed over n = 1 … m cores (where M is a system’s number of available cores, and using at least four chains is recommended (Vehtari et el.)). The best performing configuration ( m, n ) was then automatically adopted ( github.com/tstenborg/Stan - Adaptive -Parallelisation). To be relevant, benchmarking should proceed with the same data and compiled Stan model as the planned sampling. For efficiency, benchmarking was performed with fewer chain iterations than for inference proper, though using the same ratio of warmup to post-warmup iterations/chain (1 : 1/ m , yielding an equal number of total draws per configuration). For further efficiency, comparison of only one evaluation of each configuration was made. One evaluation was deemed sufficient after measuring speedup variability, for an example problem and configuration near the middle of a test system’s (Intel Core i7-10750H) non-hyperthreaded ( m , n ) configuration range. The simplifying assumption was made that results for the configuration were representative of the entire hyperthreaded and non-hyperthreaded range. Finally, for meaningful interconfiguration comparisons, a fixed seed was passed to the Stan random number generator. Warmup iterations had a significant effect on optimum ( m , n ). Too few warmup iterations, though speeding up benchmarking, can leave Stan without enough adaptation time to determine efficient sampling parameters (Hecht et al.","PeriodicalId":390064,"journal":{"name":"MODSIM2023, 25th International Congress on Modelling and Simulation.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MODSIM2023, 25th International Congress on Modelling and Simulation.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36334/modsim.2023.stenborg","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

: Stan is a probabilistic programming language that uses Markov chain Monte Carlo (MCMC) sampling for Bayesian inference (Carpenter at el.). Stan sampling can be parallelised by running Markov chains m on separate processing cores n , i.e. ≥ 1 chain/core, for Amdahlian speedup (Annis et al.). An extension , introduced here, is adaptive parallelisation. First, prior to planned sampling, performance benchmarking was dynamically performed with m = 4… M chains distributed over n = 1 … m cores (where M is a system’s number of available cores, and using at least four chains is recommended (Vehtari et el.)). The best performing configuration ( m, n ) was then automatically adopted ( github.com/tstenborg/Stan - Adaptive -Parallelisation). To be relevant, benchmarking should proceed with the same data and compiled Stan model as the planned sampling. For efficiency, benchmarking was performed with fewer chain iterations than for inference proper, though using the same ratio of warmup to post-warmup iterations/chain (1 : 1/ m , yielding an equal number of total draws per configuration). For further efficiency, comparison of only one evaluation of each configuration was made. One evaluation was deemed sufficient after measuring speedup variability, for an example problem and configuration near the middle of a test system’s (Intel Core i7-10750H) non-hyperthreaded ( m , n ) configuration range. The simplifying assumption was made that results for the configuration were representative of the entire hyperthreaded and non-hyperthreaded range. Finally, for meaningful interconfiguration comparisons, a fixed seed was passed to the Stan random number generator. Warmup iterations had a significant effect on optimum ( m , n ). Too few warmup iterations, though speeding up benchmarking, can leave Stan without enough adaptation time to determine efficient sampling parameters (Hecht et al.

查看原文本刊更多论文

Stan中的自适应MCMC并行化

Stan是一种概率编程语言，它使用马尔可夫链蒙特卡罗(MCMC)采样进行贝叶斯推理(Carpenter at el.)。Stan采样可以通过在单独的处理核n上运行马尔可夫链m来并行化，即≥1个链/核，以实现Amdahlian加速(Annis等人)。这里介绍的一个扩展是自适应并行化。首先，在计划采样之前，使用分布在n = 1…m个内核上的m = 4…m个链动态执行性能基准测试(其中m是系统可用内核的数量，建议至少使用四个链(Vehtari等人))。然后自动采用性能最佳的配置(m, n) (github.com/tstenborg/Stan - Adaptive - parallelisation)。为了保持相关性，基准测试应该使用与计划抽样相同的数据和编译Stan模型。为了提高效率，在执行基准测试时，使用的链迭代比使用推理适当时要少，尽管使用了相同的预热与预热后迭代/链的比例(1:1 / m，每个配置产生相同的总抽签次数)。为了进一步提高效率，只对每种配置的一个评价进行了比较。对于测试系统(Intel Core i7-10750H)非超线程(m, n)配置范围的中间位置的一个示例问题和配置，在测量了加速可变性之后，一个评估被认为是足够的。简化的假设是，配置的结果代表了整个超线程和非超线程范围。最后，为了进行有意义的配置间比较，将一个固定种子传递给Stan随机数生成器。预热迭代对最优(m, n)有显著影响。太少的预热迭代虽然加快了基准测试，但会使Stan没有足够的适应时间来确定有效的采样参数(Hecht等)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

MODSIM2023, 25th International Congress on Modelling and Simulation.

自引率

0.00%

发文量