EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers

Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI:10.1145/3545008.3545037

Lijuan Jiang, Ping Xu, Qianchao Zhu, Xiuhong Li, Shengen Yan, Xingcheng Zhang, Dahua Lin, Wen-Jing Ma, Zhouyang Li, Jun Liu, Jinming Ma, Minxi Jin, Chao Yang

{"title":"EasyView: Enabling and Scheduling Tensor Views in Deep Learning Compilers","authors":"Lijuan Jiang, Ping Xu, Qianchao Zhu, Xiuhong Li, Shengen Yan, Xingcheng Zhang, Dahua Lin, Wen-Jing Ma, Zhouyang Li, Jun Liu, Jinming Ma, Minxi Jin, Chao Yang","doi":"10.1145/3545008.3545037","DOIUrl":null,"url":null,"abstract":"In recent years, memory-intensive operations are becoming dominant in efficiency of running novel neural networks. Just-in-time operator fusion on accelerating devices like GPU proves an effective method for optimizing memory-intensive operations, and suits the numerous varying model structures. In particular, we find memory-intensive operations on tensor views are ubiquitous in neural network implementations. Tensors are the de facto representation for numerical data in deep learning areas, while tensor views cover a bunch of sophisticated syntax, which allow various interpretations on the underlying tensor data without memory copy. The support of views in deep learning compilers could greatly enlarge operator fusion scope, and appeal to optimizing novel neural networks. Nevertheless, mainstream solutions in state-of-the-art deep learning compilers exhibit imperfections either in view syntax representations or operator fusion. In this article, we propose EasyView, which enables and schedules tensor views in an end-to-end workflow from neural networks onto devices. Aiming at maximizing memory utilization and reducing data movement, we categorize various view contexts in high-level language, and lower views in accordance with different scenarios. Reference-semantic in terms of views are kept in the lowering from native high-level language features to intermediate representations. Based on the reserved reference-semantics, memory activities related to data dependence of read and write are tracked for further compute and memory optimization. Besides, ample operator fusion is applied to memory-intensive operations with views. In our tests, the proposed work could get average 5.63X, 2.44X, and 4.67X speedup compared with the XLA, JAX, and TorchScript, respectively for hotspot Python functions. In addition, operation fusion with views could bring 8.02% performance improvement in end-to-end neural networks.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In recent years, memory-intensive operations are becoming dominant in efficiency of running novel neural networks. Just-in-time operator fusion on accelerating devices like GPU proves an effective method for optimizing memory-intensive operations, and suits the numerous varying model structures. In particular, we find memory-intensive operations on tensor views are ubiquitous in neural network implementations. Tensors are the de facto representation for numerical data in deep learning areas, while tensor views cover a bunch of sophisticated syntax, which allow various interpretations on the underlying tensor data without memory copy. The support of views in deep learning compilers could greatly enlarge operator fusion scope, and appeal to optimizing novel neural networks. Nevertheless, mainstream solutions in state-of-the-art deep learning compilers exhibit imperfections either in view syntax representations or operator fusion. In this article, we propose EasyView, which enables and schedules tensor views in an end-to-end workflow from neural networks onto devices. Aiming at maximizing memory utilization and reducing data movement, we categorize various view contexts in high-level language, and lower views in accordance with different scenarios. Reference-semantic in terms of views are kept in the lowering from native high-level language features to intermediate representations. Based on the reserved reference-semantics, memory activities related to data dependence of read and write are tracked for further compute and memory optimization. Besides, ample operator fusion is applied to memory-intensive operations with views. In our tests, the proposed work could get average 5.63X, 2.44X, and 4.67X speedup compared with the XLA, JAX, and TorchScript, respectively for hotspot Python functions. In addition, operation fusion with views could bring 8.02% performance improvement in end-to-end neural networks.

查看原文本刊更多论文

在深度学习编译器中启用和调度张量视图

近年来，内存密集型运算在新型神经网络的运行效率中占据主导地位。实时算子融合在GPU等加速设备上被证明是一种有效的优化内存密集型操作的方法，并且适合多种不同的模型结构。特别是，我们发现对张量视图的内存密集型操作在神经网络实现中无处不在。张量是深度学习领域数值数据的事实上的表示，而张量视图涵盖了一堆复杂的语法，它允许在没有内存副本的情况下对底层张量数据进行各种解释。深度学习编译器中视图的支持可以极大地扩大算子融合的范围，有利于优化新型神经网络。然而，最先进的深度学习编译器的主流解决方案在视图语法表示或操作符融合方面表现出不完美。在本文中，我们提出了EasyView，它在从神经网络到设备的端到端工作流中启用和调度张量视图。为了最大限度地提高内存利用率和减少数据移动，我们用高级语言对各种视图上下文进行了分类，并根据不同的场景对低级视图进行了分类。视图方面的引用语义保持在从本地高级语言特征到中间表示的降低过程中。基于保留引用语义，跟踪与读写数据依赖性相关的内存活动，以便进一步进行计算和内存优化。此外，充分的算子融合应用于具有视图的内存密集型操作。在我们的测试中，对于热点Python函数，与XLA、JAX和TorchScript相比，建议的工作可以分别获得5.63X、2.44X和4.67X的平均加速。此外，与视图的操作融合可以使端到端神经网络的性能提高8.02%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 51st International Conference on Parallel Processing

自引率

0.00%

发文量