fpga加速密集线性机器学习:精度收敛的权衡

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI:10.1109/FCCM.2017.39

Kaan Kara, Dan Alistarh, G. Alonso, O. Mutlu, Ce Zhang

{"title":"fpga加速密集线性机器学习:精度收敛的权衡","authors":"Kaan Kara, Dan Alistarh, G. Alonso, O. Mutlu, Ce Zhang","doi":"10.1109/FCCM.2017.39","DOIUrl":null,"url":null,"abstract":"Stochastic gradient descent (SGD) is a commonly used algorithm for training linear machine learning models. Based on vector algebra, it benefits from the inherent parallelism available in an FPGA. In this paper, we first present a single-precision floating-point SGD implementation on an FPGA that provides similar performance as a 10-core CPU. We then adapt the design to make it capable of processing low-precision data. The low-precision data is obtained from a novel compression scheme—called stochastic quantization, specifically designed for machine learning applications. We test both full-precision and low-precision designs on various regression and classification data sets. We achieve up to an order of magnitude training speedup when using low-precision data compared to a full-precision SGD on the same FPGA and a state-of-the-art multi-core solution, while maintaining the quality of training. We open source the designs presented in this paper.","PeriodicalId":124631,"journal":{"name":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"327 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":"{\"title\":\"FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off\",\"authors\":\"Kaan Kara, Dan Alistarh, G. Alonso, O. Mutlu, Ce Zhang\",\"doi\":\"10.1109/FCCM.2017.39\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stochastic gradient descent (SGD) is a commonly used algorithm for training linear machine learning models. Based on vector algebra, it benefits from the inherent parallelism available in an FPGA. In this paper, we first present a single-precision floating-point SGD implementation on an FPGA that provides similar performance as a 10-core CPU. We then adapt the design to make it capable of processing low-precision data. The low-precision data is obtained from a novel compression scheme—called stochastic quantization, specifically designed for machine learning applications. We test both full-precision and low-precision designs on various regression and classification data sets. We achieve up to an order of magnitude training speedup when using low-precision data compared to a full-precision SGD on the same FPGA and a state-of-the-art multi-core solution, while maintaining the quality of training. We open source the designs presented in this paper.\",\"PeriodicalId\":124631,\"journal\":{\"name\":\"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"volume\":\"327 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"66\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2017.39\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2017.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 66

摘要

随机梯度下降(SGD)是一种常用的线性机器学习模型训练算法。基于矢量代数，它受益于FPGA固有的并行性。在本文中，我们首先在FPGA上提出了一个单精度浮点SGD实现，它提供了与10核CPU类似的性能。然后我们调整设计，使其能够处理低精度的数据。低精度数据是从一种新的压缩方案中获得的，称为随机量化，专门为机器学习应用而设计。我们在各种回归和分类数据集上测试了全精度和低精度设计。与同一FPGA上的全精度SGD和最先进的多核解决方案相比，我们在使用低精度数据时实现了高达数量级的训练加速，同时保持了训练质量。我们将本文中提出的设计开源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off

Stochastic gradient descent (SGD) is a commonly used algorithm for training linear machine learning models. Based on vector algebra, it benefits from the inherent parallelism available in an FPGA. In this paper, we first present a single-precision floating-point SGD implementation on an FPGA that provides similar performance as a 10-core CPU. We then adapt the design to make it capable of processing low-precision data. The low-precision data is obtained from a novel compression scheme—called stochastic quantization, specifically designed for machine learning applications. We test both full-precision and low-precision designs on various regression and classification data sets. We achieve up to an order of magnitude training speedup when using low-precision data compared to a full-precision SGD on the same FPGA and a state-of-the-art multi-core solution, while maintaining the quality of training. We open source the designs presented in this paper.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量