One-shot tuner for deep learning compilers

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction Pub Date : 2022-03-18 DOI:10.1145/3497776.3517774

Jaehun Ryu, Eunhyeok Park, Hyojin Sung

引用次数: 5

Abstract

Auto-tuning DL compilers are gaining ground as an optimizing back-end for DL frameworks. While existing work can generate deep learning models that exceed the performance of hand-tuned libraries, they still suffer from prohibitively long auto-tuning time due to repeated hardware measurements in large search spaces. In this paper, we take a neural-predictor inspired approach to reduce the auto-tuning overhead and show that a performance predictor model trained prior to compilation can produce optimized tensor operation codes without repeated search and hardware measurements. To generate a sample-efficient training dataset, we extend input representation to include task-specific information and to guide data sampling methods to focus on learning high-performing codes. We evaluated the resulting predictor model, One-Shot Tuner, against AutoTVM and other prior work, and the results show that One-Shot Tuner speeds up compilation by 2.81x to 67.7x compared to prior work while providing comparable or improved inference time for CNN and Transformer models.

查看原文本刊更多论文

深度学习编译器的一次性调谐器

作为DL框架的优化后端，自动调优DL编译器正在获得越来越多的支持。虽然现有的工作可以生成超过手动调优库性能的深度学习模型，但由于在大型搜索空间中重复进行硬件测量，它们仍然需要花费非常长的自动调优时间。在本文中，我们采用神经预测器启发的方法来减少自动调优开销，并表明在编译之前训练的性能预测器模型可以产生优化的张量操作代码，而无需重复搜索和硬件测量。为了生成一个样本高效的训练数据集，我们扩展了输入表示，以包括特定于任务的信息，并指导数据采样方法专注于学习高性能代码。我们将得到的预测模型One-Shot Tuner与AutoTVM和其他先前的工作进行了评估，结果表明One-Shot Tuner将编译速度提高了2.81倍至67.7倍，同时为CNN和Transformer模型提供了相当或改进的推理时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction

自引率

0.00%

发文量