On Optimizing the Arithmetic Precision of MCMC Algorithms

Grigorios Mingas, Farhan Rahman, C. Bouganis
{"title":"On Optimizing the Arithmetic Precision of MCMC Algorithms","authors":"Grigorios Mingas, Farhan Rahman, C. Bouganis","doi":"10.1109/FCCM.2013.31","DOIUrl":null,"url":null,"abstract":"Markov Chain Monte Carlo (MCMC) is an ubiquitous stochastic method, used to draw random samples from arbitrary probability distributions, such as the ones encountered in Bayesian inference. MCMC often requires forbiddingly long runtimes to give a representative sample in problems with high dimensions and large-scale data. Field-Programmable Gate Arrays (FPGAs) have proven to be a suitable platform for MCMC acceleration due to their ability to support massive parallelism. This paper introduces an automated method, which minimizes the floating point precision of the most computationally intensive part of an FPGA-mapped MCMC sampler, while keeping the precision-related bias in the output within a user-specified tolerance. The method is based on an efficient bias estimator, proposed here, which is able to estimate the bias in the output with only few random samples. The optimization process involves FPGA pre-runs, which estimate the bias and choose the optimized precision. This precision is then used to reconfigure the FPGA for the final, long MCMC run, allowing for higher sampling throughputs. The process requires no user intervention. The method is tested on two Bayesian inference case studies: Mixture models and neural network regression. The achieved speedups over double-precision FPGA designs were 3.5x-5x (including the optimization overhead). Comparisons with a sequential CPU and a GPGPU showed speedups of 223x-446x and 16x-18x respectively.","PeriodicalId":269887,"journal":{"name":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2013.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Markov Chain Monte Carlo (MCMC) is an ubiquitous stochastic method, used to draw random samples from arbitrary probability distributions, such as the ones encountered in Bayesian inference. MCMC often requires forbiddingly long runtimes to give a representative sample in problems with high dimensions and large-scale data. Field-Programmable Gate Arrays (FPGAs) have proven to be a suitable platform for MCMC acceleration due to their ability to support massive parallelism. This paper introduces an automated method, which minimizes the floating point precision of the most computationally intensive part of an FPGA-mapped MCMC sampler, while keeping the precision-related bias in the output within a user-specified tolerance. The method is based on an efficient bias estimator, proposed here, which is able to estimate the bias in the output with only few random samples. The optimization process involves FPGA pre-runs, which estimate the bias and choose the optimized precision. This precision is then used to reconfigure the FPGA for the final, long MCMC run, allowing for higher sampling throughputs. The process requires no user intervention. The method is tested on two Bayesian inference case studies: Mixture models and neural network regression. The achieved speedups over double-precision FPGA designs were 3.5x-5x (including the optimization overhead). Comparisons with a sequential CPU and a GPGPU showed speedups of 223x-446x and 16x-18x respectively.
MCMC算法的算法精度优化
马尔可夫链蒙特卡罗(MCMC)是一种普遍存在的随机方法,用于从任意概率分布中抽取随机样本,例如在贝叶斯推理中遇到的随机样本。在具有高维和大规模数据的问题中,MCMC通常需要非常长的运行时间才能给出具有代表性的样本。现场可编程门阵列(fpga)由于其支持大规模并行的能力,已被证明是MCMC加速的合适平台。本文介绍了一种自动化方法,该方法可以最大限度地降低fpga映射MCMC采样器中计算最密集部分的浮点精度,同时保持输出中与精度相关的偏差在用户指定的公差范围内。该方法基于一种有效的偏差估计器,可以在少量随机样本的情况下估计输出中的偏差。优化过程包括FPGA预运行,预运行预估偏置并选择优化精度。然后,这种精度用于重新配置FPGA,以实现最终的长时间MCMC运行,从而允许更高的采样吞吐量。该过程不需要用户干预。该方法在混合模型和神经网络回归两个贝叶斯推理案例中进行了测试。通过双精度FPGA设计实现的加速是3.5 -5倍(包括优化开销)。与顺序CPU和GPGPU的比较显示,速度分别为223x-446x和16x-18x。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信