M. Hosseini, Rashidul Islam, A. Kulkarni, T. Mohsenin
{"title":"高通量MCMC算法的可扩展fpga加速器","authors":"M. Hosseini, Rashidul Islam, A. Kulkarni, T. Mohsenin","doi":"10.1109/FCCM.2017.56","DOIUrl":null,"url":null,"abstract":"Markov Chain Monte Carlo (MCMC) algorithms are used to obtain samples from any target probability distribution and are widely used in stochastic processing techniques. Stochastic processing techniques such as machine learning and image processing need to compute large amounts of data in real-time, thus high throughput MCMC samplers are of utmost importance. Parallel Tempering (PT) MCMC has proven better mixing and convergence for high-dimensional and multi-modal distributions compared to other popular MCMC algorithms. In this paper, we employ a special case of Dth order Markov chains to modify the PT-MCMC algorithm, named \"Multiple Parallel Tempering\" (MPT). The modification converts one MCMC sampler into multiple independent samplers that generate and interleave their samples on one output line each clock cycle. A fully scalable and pipelined hardware accelerator for the PT and proposed MPT sampler is designed and implemented on Artix-7 Xilinx FPGA for chain numbers of 1, 2, and 8. The post-place and route FPGA implementation results indicate that the throughput of the proposed MPT sampler for chain numbers 1, 2, and 8 achieves 31x, 31x, and 28x respectively higher as compared to PT sampler with the same chain number configuration.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A Scalable FPGA-Based Accelerator for High-Throughput MCMC Algorithms\",\"authors\":\"M. Hosseini, Rashidul Islam, A. Kulkarni, T. Mohsenin\",\"doi\":\"10.1109/FCCM.2017.56\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Markov Chain Monte Carlo (MCMC) algorithms are used to obtain samples from any target probability distribution and are widely used in stochastic processing techniques. Stochastic processing techniques such as machine learning and image processing need to compute large amounts of data in real-time, thus high throughput MCMC samplers are of utmost importance. Parallel Tempering (PT) MCMC has proven better mixing and convergence for high-dimensional and multi-modal distributions compared to other popular MCMC algorithms. In this paper, we employ a special case of Dth order Markov chains to modify the PT-MCMC algorithm, named \\\"Multiple Parallel Tempering\\\" (MPT). The modification converts one MCMC sampler into multiple independent samplers that generate and interleave their samples on one output line each clock cycle. A fully scalable and pipelined hardware accelerator for the PT and proposed MPT sampler is designed and implemented on Artix-7 Xilinx FPGA for chain numbers of 1, 2, and 8. The post-place and route FPGA implementation results indicate that the throughput of the proposed MPT sampler for chain numbers 1, 2, and 8 achieves 31x, 31x, and 28x respectively higher as compared to PT sampler with the same chain number configuration.\",\"PeriodicalId\":124631,\"journal\":{\"name\":\"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2017.56\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2017.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Scalable FPGA-Based Accelerator for High-Throughput MCMC Algorithms
Markov Chain Monte Carlo (MCMC) algorithms are used to obtain samples from any target probability distribution and are widely used in stochastic processing techniques. Stochastic processing techniques such as machine learning and image processing need to compute large amounts of data in real-time, thus high throughput MCMC samplers are of utmost importance. Parallel Tempering (PT) MCMC has proven better mixing and convergence for high-dimensional and multi-modal distributions compared to other popular MCMC algorithms. In this paper, we employ a special case of Dth order Markov chains to modify the PT-MCMC algorithm, named "Multiple Parallel Tempering" (MPT). The modification converts one MCMC sampler into multiple independent samplers that generate and interleave their samples on one output line each clock cycle. A fully scalable and pipelined hardware accelerator for the PT and proposed MPT sampler is designed and implemented on Artix-7 Xilinx FPGA for chain numbers of 1, 2, and 8. The post-place and route FPGA implementation results indicate that the throughput of the proposed MPT sampler for chain numbers 1, 2, and 8 achieves 31x, 31x, and 28x respectively higher as compared to PT sampler with the same chain number configuration.