fpga大数据应用的通信感知MCMC方法

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI:10.1109/FCCM.2017.9

Shuanglong Liu, C. Bouganis

{"title":"fpga大数据应用的通信感知MCMC方法","authors":"Shuanglong Liu, C. Bouganis","doi":"10.1109/FCCM.2017.9","DOIUrl":null,"url":null,"abstract":"Markov Chain Monte Carlo (MCMC) based methods have been the main tool for Bayesian Inference for some years now, and recently they find increasing applications in modern statistics and machine learning. Nevertheless, with the availability of large datasets and increasing complexity of Bayesian models, MCMC methods are becoming prohibitively expensive for real-world problems. At the heart of these methods, lies the computation of likelihood functions that requires access to all input data points in each iteration of the method. Current approaches, based on data subsampling, aim to accelerate these algorithms by reducing the number of the data points for likelihood evaluations at each MCMC iteration. However the existing work doesn't consider the properties of modern memory hierarchies, but treats the memory as one monolithic storage space. This paper proposes a communication-aware MCMC framework that takes into account the underlying performance of the memory subsystem. The framework is based on a novel subsampling algorithm that utilises an unbiased likelihood estimator based on Probability Proportional-to-Size (PPS) sampling, allowing information on the performance of the memory system to be taken into account during the sampling stage. The proposed MCMC sampler is mapped to an FPGA device and its performance is evaluated using the Bayesian logistic regression model on MNIST dataset. The proposed system achieves a 3.37x speed up over a highly optimised traditional FPGA design, therefore the risk in the estimates based on the generated samples is largely decreased.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Communication-Aware MCMC Method for Big Data Applications on FPGAs\",\"authors\":\"Shuanglong Liu, C. Bouganis\",\"doi\":\"10.1109/FCCM.2017.9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Markov Chain Monte Carlo (MCMC) based methods have been the main tool for Bayesian Inference for some years now, and recently they find increasing applications in modern statistics and machine learning. Nevertheless, with the availability of large datasets and increasing complexity of Bayesian models, MCMC methods are becoming prohibitively expensive for real-world problems. At the heart of these methods, lies the computation of likelihood functions that requires access to all input data points in each iteration of the method. Current approaches, based on data subsampling, aim to accelerate these algorithms by reducing the number of the data points for likelihood evaluations at each MCMC iteration. However the existing work doesn't consider the properties of modern memory hierarchies, but treats the memory as one monolithic storage space. This paper proposes a communication-aware MCMC framework that takes into account the underlying performance of the memory subsystem. The framework is based on a novel subsampling algorithm that utilises an unbiased likelihood estimator based on Probability Proportional-to-Size (PPS) sampling, allowing information on the performance of the memory system to be taken into account during the sampling stage. The proposed MCMC sampler is mapped to an FPGA device and its performance is evaluated using the Bayesian logistic regression model on MNIST dataset. The proposed system achieves a 3.37x speed up over a highly optimised traditional FPGA design, therefore the risk in the estimates based on the generated samples is largely decreased.\",\"PeriodicalId\":124631,\"journal\":{\"name\":\"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2017.9\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2017.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

基于马尔可夫链蒙特卡罗(MCMC)的方法多年来一直是贝叶斯推理的主要工具，最近它们在现代统计学和机器学习中的应用越来越多。然而，随着大型数据集的可用性和贝叶斯模型的复杂性的增加，MCMC方法对于现实世界的问题变得过于昂贵。这些方法的核心在于计算似然函数，该函数需要访问方法每次迭代中的所有输入数据点。目前基于数据子采样的方法旨在通过减少每次MCMC迭代中进行似然评估的数据点数量来加速这些算法。然而，现有的工作没有考虑到现代内存层次结构的特性，而是将内存视为一个单一的存储空间。本文提出了一种考虑存储子系统底层性能的通信感知MCMC框架。该框架基于一种新颖的子采样算法，该算法利用基于概率比例大小(PPS)采样的无偏似然估计器，允许在采样阶段考虑存储系统性能的信息。将所提出的MCMC采样器映射到FPGA器件上，并在MNIST数据集上使用贝叶斯逻辑回归模型对其性能进行了评估。与高度优化的传统FPGA设计相比，所提出的系统实现了3.37倍的速度提升，因此基于生成样本的估计风险大大降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Communication-Aware MCMC Method for Big Data Applications on FPGAs

Markov Chain Monte Carlo (MCMC) based methods have been the main tool for Bayesian Inference for some years now, and recently they find increasing applications in modern statistics and machine learning. Nevertheless, with the availability of large datasets and increasing complexity of Bayesian models, MCMC methods are becoming prohibitively expensive for real-world problems. At the heart of these methods, lies the computation of likelihood functions that requires access to all input data points in each iteration of the method. Current approaches, based on data subsampling, aim to accelerate these algorithms by reducing the number of the data points for likelihood evaluations at each MCMC iteration. However the existing work doesn't consider the properties of modern memory hierarchies, but treats the memory as one monolithic storage space. This paper proposes a communication-aware MCMC framework that takes into account the underlying performance of the memory subsystem. The framework is based on a novel subsampling algorithm that utilises an unbiased likelihood estimator based on Probability Proportional-to-Size (PPS) sampling, allowing information on the performance of the memory system to be taken into account during the sampling stage. The proposed MCMC sampler is mapped to an FPGA device and its performance is evaluated using the Bayesian logistic regression model on MNIST dataset. The proposed system achieves a 3.37x speed up over a highly optimised traditional FPGA design, therefore the risk in the estimates based on the generated samples is largely decreased.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量