DNNFusion: accelerating deep neural networks execution with advanced operator fusion

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation Pub Date : 2020-09-30 DOI:10.1145/3453483.3454083

Wei Niu, Jiexiong Guan, Yanzhi Wang, G. Agrawal, Bin Ren

{"title":"DNNFusion: accelerating deep neural networks execution with advanced operator fusion","authors":"Wei Niu, Jiexiong Guan, Yanzhi Wang, G. Agrawal, Bin Ren","doi":"10.1145/3453483.3454083","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections, especially those seen in many extremely deep models. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8 × higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3× speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.","PeriodicalId":20557,"journal":{"name":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","volume":"15 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"72","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3453483.3454083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 72

Abstract

Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN, that aim to improve the efficiency of the DNN inference. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections, especially those seen in many extremely deep models. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by developing a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8 × higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3× speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.

查看原文本刊更多论文

DNNFusion:通过先进的算子融合加速深度神经网络的执行

深度神经网络(dnn)已经成为移动设备上许多主要应用程序的核心推动者。为了达到较高的精度，DNN模型变得越来越深，有数百甚至数千个算子层，导致对推理的内存和计算要求很高。算子融合(或核/层融合)是许多最先进的深度神经网络执行框架(如TensorFlow, TVM和MNN)的关键优化，旨在提高深度神经网络推理的效率。然而，这些框架通常采用基于某些模式的融合方法，这些模式的限制太大，无法覆盖操作符和层连接的多样性，特别是在许多极深模型中看到的那些。另一方面，基于多面体的环路融合技术在没有操作符级别信息的情况下对计算进行低级视图处理，并且也可能错过潜在的融合机会。为了解决这一挑战，本文提出了一种新颖而广泛的环路融合框架，称为DNNFusion。这项工作的基本思想是在dnn的算子视图下工作，但通过开发单个算子及其组合的分类来扩大融合机会。此外，DNNFusion还包括:1)一种新颖的基于数学属性的图形重写框架，以降低评估成本并促进后续的操作员融合;2)综合融合计划生成，利用高级分析和精确的轻量级分析;3)融合代码生成期间的额外优化。DNNFusion在15个DNN模型上进行了广泛的评估，这些模型具有不同类型的任务、模型大小和层数。评估结果表明，DNNFusion发现高达8.8倍的高融合机会，以9.3倍的加速优于四种最先进的DNN执行框架。内存需求的减少和速度的提高可以使许多目标模型在移动设备上执行，甚至使它们成为实时应用程序的一部分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation

自引率

0.00%

发文量