- Book学术

发布求助

文献互助智能选刊最新文献

ACM Transactions on Computer Systems (TOCS) Pub Date : 2021-03-01 DOI:10.1145/3444943

Chaim Baskin, Natan Liss, Eli Schwartz, Evgenii Zheltonozhskii, R. Giryes, A. Bronstein, A. Mendelson

引用次数: 6

摘要

提出了一种新的神经网络量化方法。我们的方法UNIQ模拟了一个非均匀的k-分位数量化器，并通过在训练时向权值注入噪声使模型在量化权值时表现良好。作为向权重注入噪声的副产品，我们发现激活也可以量化到低至8位，而精度只会有轻微的下降。我们的非均匀量化方法为现有的神经网络均匀量化技术提供了一种新的选择。我们进一步提出了一种新的比特操作数复杂度度量，并证明了该度量与逻辑利用率和功耗呈线性关系。我们建议评估准确性与复杂性(BOPs)之间的权衡。当在ResNet18/34/50和ImageNet上的MobileNet上进行评估时，所提出的方法在低复杂性和高精度方面都优于现有技术。通过在FPGA上实现我们的非均匀量化CNN，我们证明了这种方法的实际适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

UNIQ

We present a novel method for neural network quantization. Our method, named UNIQ, emulates a non-uniform k-quantile quantizer and adapts the model to perform well with quantized weights by injecting noise to the weights at training time. As a by-product of injecting noise to weights, we find that activations can also be quantized to as low as 8-bit with only a minor accuracy degradation. Our non-uniform quantization approach provides a novel alternative to the existing uniform quantization techniques for neural networks. We further propose a novel complexity metric of number of bit operations performed (BOPs), and we show that this metric has a linear relation with logic utilization and power. We suggest evaluating the trade-off of accuracy vs. complexity (BOPs). The proposed method, when evaluated on ResNet18/34/50 and MobileNet on ImageNet, outperforms the prior state of the art both in the low-complexity regime and the high accuracy regime. We demonstrate the practical applicability of this approach, by implementing our non-uniformly quantized CNN on FPGA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Computer Systems (TOCS)

自引率

0.00%

发文量