FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2017-04-01 DOI:10.1109/FCCM.2017.39

Kaan Kara, Dan Alistarh, G. Alonso, O. Mutlu, Ce Zhang

引用次数: 66

Abstract

Stochastic gradient descent (SGD) is a commonly used algorithm for training linear machine learning models. Based on vector algebra, it benefits from the inherent parallelism available in an FPGA. In this paper, we first present a single-precision floating-point SGD implementation on an FPGA that provides similar performance as a 10-core CPU. We then adapt the design to make it capable of processing low-precision data. The low-precision data is obtained from a novel compression scheme—called stochastic quantization, specifically designed for machine learning applications. We test both full-precision and low-precision designs on various regression and classification data sets. We achieve up to an order of magnitude training speedup when using low-precision data compared to a full-precision SGD on the same FPGA and a state-of-the-art multi-core solution, while maintaining the quality of training. We open source the designs presented in this paper.

查看原文本刊更多论文

fpga加速密集线性机器学习:精度收敛的权衡

随机梯度下降(SGD)是一种常用的线性机器学习模型训练算法。基于矢量代数，它受益于FPGA固有的并行性。在本文中，我们首先在FPGA上提出了一个单精度浮点SGD实现，它提供了与10核CPU类似的性能。然后我们调整设计，使其能够处理低精度的数据。低精度数据是从一种新的压缩方案中获得的，称为随机量化，专门为机器学习应用而设计。我们在各种回归和分类数据集上测试了全精度和低精度设计。与同一FPGA上的全精度SGD和最先进的多核解决方案相比，我们在使用低精度数据时实现了高达数量级的训练加速，同时保持了训练质量。我们将本文中提出的设计开源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量