Adaptive MCMC parallelisation in Stan

T. Stenborg
{"title":"Adaptive MCMC parallelisation in Stan","authors":"T. Stenborg","doi":"10.36334/modsim.2023.stenborg","DOIUrl":null,"url":null,"abstract":": Stan is a probabilistic programming language that uses Markov chain Monte Carlo (MCMC) sampling for Bayesian inference (Carpenter at el.). Stan sampling can be parallelised by running Markov chains m on separate processing cores n , i.e. ≥ 1 chain/core, for Amdahlian speedup (Annis et al.). An extension , introduced here, is adaptive parallelisation. First, prior to planned sampling, performance benchmarking was dynamically performed with m = 4… M chains distributed over n = 1 … m cores (where M is a system’s number of available cores, and using at least four chains is recommended (Vehtari et el.)). The best performing configuration ( m, n ) was then automatically adopted ( github.com/tstenborg/Stan - Adaptive -Parallelisation). To be relevant, benchmarking should proceed with the same data and compiled Stan model as the planned sampling. For efficiency, benchmarking was performed with fewer chain iterations than for inference proper, though using the same ratio of warmup to post-warmup iterations/chain (1 : 1/ m , yielding an equal number of total draws per configuration). For further efficiency, comparison of only one evaluation of each configuration was made. One evaluation was deemed sufficient after measuring speedup variability, for an example problem and configuration near the middle of a test system’s (Intel Core i7-10750H) non-hyperthreaded ( m , n ) configuration range. The simplifying assumption was made that results for the configuration were representative of the entire hyperthreaded and non-hyperthreaded range. Finally, for meaningful interconfiguration comparisons, a fixed seed was passed to the Stan random number generator. Warmup iterations had a significant effect on optimum ( m , n ). Too few warmup iterations, though speeding up benchmarking, can leave Stan without enough adaptation time to determine efficient sampling parameters (Hecht et al.","PeriodicalId":390064,"journal":{"name":"MODSIM2023, 25th International Congress on Modelling and Simulation.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MODSIM2023, 25th International Congress on Modelling and Simulation.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36334/modsim.2023.stenborg","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

: Stan is a probabilistic programming language that uses Markov chain Monte Carlo (MCMC) sampling for Bayesian inference (Carpenter at el.). Stan sampling can be parallelised by running Markov chains m on separate processing cores n , i.e. ≥ 1 chain/core, for Amdahlian speedup (Annis et al.). An extension , introduced here, is adaptive parallelisation. First, prior to planned sampling, performance benchmarking was dynamically performed with m = 4… M chains distributed over n = 1 … m cores (where M is a system’s number of available cores, and using at least four chains is recommended (Vehtari et el.)). The best performing configuration ( m, n ) was then automatically adopted ( github.com/tstenborg/Stan - Adaptive -Parallelisation). To be relevant, benchmarking should proceed with the same data and compiled Stan model as the planned sampling. For efficiency, benchmarking was performed with fewer chain iterations than for inference proper, though using the same ratio of warmup to post-warmup iterations/chain (1 : 1/ m , yielding an equal number of total draws per configuration). For further efficiency, comparison of only one evaluation of each configuration was made. One evaluation was deemed sufficient after measuring speedup variability, for an example problem and configuration near the middle of a test system’s (Intel Core i7-10750H) non-hyperthreaded ( m , n ) configuration range. The simplifying assumption was made that results for the configuration were representative of the entire hyperthreaded and non-hyperthreaded range. Finally, for meaningful interconfiguration comparisons, a fixed seed was passed to the Stan random number generator. Warmup iterations had a significant effect on optimum ( m , n ). Too few warmup iterations, though speeding up benchmarking, can leave Stan without enough adaptation time to determine efficient sampling parameters (Hecht et al.
Stan中的自适应MCMC并行化
Stan是一种概率编程语言,它使用马尔可夫链蒙特卡罗(MCMC)采样进行贝叶斯推理(Carpenter at el.)。Stan采样可以通过在单独的处理核n上运行马尔可夫链m来并行化,即≥1个链/核,以实现Amdahlian加速(Annis等人)。这里介绍的一个扩展是自适应并行化。首先,在计划采样之前,使用分布在n = 1…m个内核上的m = 4…m个链动态执行性能基准测试(其中m是系统可用内核的数量,建议至少使用四个链(Vehtari等人))。然后自动采用性能最佳的配置(m, n) (github.com/tstenborg/Stan - Adaptive - parallelisation)。为了保持相关性,基准测试应该使用与计划抽样相同的数据和编译Stan模型。为了提高效率,在执行基准测试时,使用的链迭代比使用推理适当时要少,尽管使用了相同的预热与预热后迭代/链的比例(1:1 / m,每个配置产生相同的总抽签次数)。为了进一步提高效率,只对每种配置的一个评价进行了比较。对于测试系统(Intel Core i7-10750H)非超线程(m, n)配置范围的中间位置的一个示例问题和配置,在测量了加速可变性之后,一个评估被认为是足够的。简化的假设是,配置的结果代表了整个超线程和非超线程范围。最后,为了进行有意义的配置间比较,将一个固定种子传递给Stan随机数生成器。预热迭代对最优(m, n)有显著影响。太少的预热迭代虽然加快了基准测试,但会使Stan没有足够的适应时间来确定有效的采样参数(Hecht等)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信