{"title":"利用Dropout提高FPGA受限玻尔兹曼机的网络规模和训练吞吐量","authors":"Jiang Su, David B. Thomas, P. Cheung","doi":"10.1109/FCCM.2016.23","DOIUrl":null,"url":null,"abstract":"Restricted Boltzmann Machines (RBMs) are widely used in modern machine learning tasks. Existing implementations are limited in network size and training throughput by available DSP resources. In this work we propose a new algorithm and architecture for FPGAs called dropout-RBM (dRBM) system. Compared to the state-of-art design methods on the same FPGA, dRBM with a dropout rate 0.5 doubles the maximum affordable network size using only half of DSP and BRAM resources. This is achieved by an application of a technique called dropout, which is a relatively new method used to avoid overfitting of data. Here we instead apply dropout as a technique for reducing the required DSPs and BRAM resources, while also having the side-effect of increasing robustness of training. Also to improve the processing throughput, we propose a multi-mode matrix multiplication module that maximizes the DSP efficiency. For the MNIST classificationbenchmark, a Stratix IV EP4SGX530 FPGA running dRBM is 34x faster than a single-precision Matlab implementation running on Intel i7 2.9GHz CPU.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Increasing Network Size and Training Throughput of FPGA Restricted Boltzmann Machines Using Dropout\",\"authors\":\"Jiang Su, David B. Thomas, P. Cheung\",\"doi\":\"10.1109/FCCM.2016.23\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Restricted Boltzmann Machines (RBMs) are widely used in modern machine learning tasks. Existing implementations are limited in network size and training throughput by available DSP resources. In this work we propose a new algorithm and architecture for FPGAs called dropout-RBM (dRBM) system. Compared to the state-of-art design methods on the same FPGA, dRBM with a dropout rate 0.5 doubles the maximum affordable network size using only half of DSP and BRAM resources. This is achieved by an application of a technique called dropout, which is a relatively new method used to avoid overfitting of data. Here we instead apply dropout as a technique for reducing the required DSPs and BRAM resources, while also having the side-effect of increasing robustness of training. Also to improve the processing throughput, we propose a multi-mode matrix multiplication module that maximizes the DSP efficiency. For the MNIST classificationbenchmark, a Stratix IV EP4SGX530 FPGA running dRBM is 34x faster than a single-precision Matlab implementation running on Intel i7 2.9GHz CPU.\",\"PeriodicalId\":113498,\"journal\":{\"name\":\"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2016.23\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2016.23","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
摘要
受限玻尔兹曼机(rbm)广泛应用于现代机器学习任务中。现有的实现受到网络大小和训练吞吐量的限制。在这项工作中,我们提出了一种新的fpga算法和架构,称为dropout-RBM (dRBM)系统。与同一FPGA上最先进的设计方法相比,丢弃率为0.5的dRBM仅使用一半的DSP和BRAM资源,最大可负担网络大小翻了一番。这是通过应用一种称为dropout的技术来实现的,这是一种用于避免数据过拟合的相对较新的方法。在这里,我们转而应用dropout作为减少所需dsp和BRAM资源的技术,同时还具有增加训练鲁棒性的副作用。为了提高处理吞吐量,我们提出了一个多模矩阵乘法模块,以最大限度地提高DSP的效率。对于MNIST分类基准测试,运行dRBM的Stratix IV EP4SGX530 FPGA比运行在Intel i7 2.9GHz CPU上的单精度Matlab实现快34倍。
Increasing Network Size and Training Throughput of FPGA Restricted Boltzmann Machines Using Dropout
Restricted Boltzmann Machines (RBMs) are widely used in modern machine learning tasks. Existing implementations are limited in network size and training throughput by available DSP resources. In this work we propose a new algorithm and architecture for FPGAs called dropout-RBM (dRBM) system. Compared to the state-of-art design methods on the same FPGA, dRBM with a dropout rate 0.5 doubles the maximum affordable network size using only half of DSP and BRAM resources. This is achieved by an application of a technique called dropout, which is a relatively new method used to avoid overfitting of data. Here we instead apply dropout as a technique for reducing the required DSPs and BRAM resources, while also having the side-effect of increasing robustness of training. Also to improve the processing throughput, we propose a multi-mode matrix multiplication module that maximizes the DSP efficiency. For the MNIST classificationbenchmark, a Stratix IV EP4SGX530 FPGA running dRBM is 34x faster than a single-precision Matlab implementation running on Intel i7 2.9GHz CPU.