C3-Flow: Compute Compression Co-Design Flow for Deep Neural Networks

Proceedings of the 56th Annual Design Automation Conference 2019 Pub Date : 2019-06-02 DOI:10.1145/3316781.3317786

Matthew Sotoudeh, Sara S. Baghsorkhi

引用次数: 2

Abstract

Existing approaches to neural network compression have failed to holistically address algorithmic (training accuracy) and computational (inference performance) demands of real-world systems, particularly on resource-constrained devices. We present C3-Flow, a new approach adding non-uniformity to low-rank approximations and designed specifically to enable highly-efficient computation on common hardware architectures while retaining more accuracy than competing methods. Evaluation on two state-of-the-art acoustic models (versus existing work, empirical limit study approaches, and hand-tuned models) demonstrates up to 60% lower error. Finally, we show that our co-design approach achieves up to 14X inference speedup across three Haswell- and Broadwell-based platforms.

查看原文本刊更多论文

C3-Flow:深度神经网络计算压缩协同设计流程

现有的神经网络压缩方法未能全面解决现实世界系统的算法(训练精度)和计算(推理性能)需求，特别是在资源受限的设备上。我们提出了C3-Flow，这是一种将非均匀性添加到低秩近似中的新方法，专门设计用于在通用硬件架构上实现高效计算，同时保持比竞争方法更高的准确性。对两种最先进的声学模型(与现有工作、经验极限研究方法和手动调谐模型相比)的评估表明，误差降低了60%。最后，我们展示了我们的协同设计方法在三个基于Haswell和broadwell的平台上实现了高达14倍的推理加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 56th Annual Design Automation Conference 2019

自引率

0.00%

发文量