深度学习编译器的一次性调谐器

Jaehun Ryu, Eunhyeok Park, Hyojin Sung
{"title":"深度学习编译器的一次性调谐器","authors":"Jaehun Ryu, Eunhyeok Park, Hyojin Sung","doi":"10.1145/3497776.3517774","DOIUrl":null,"url":null,"abstract":"Auto-tuning DL compilers are gaining ground as an optimizing back-end for DL frameworks. While existing work can generate deep learning models that exceed the performance of hand-tuned libraries, they still suffer from prohibitively long auto-tuning time due to repeated hardware measurements in large search spaces. In this paper, we take a neural-predictor inspired approach to reduce the auto-tuning overhead and show that a performance predictor model trained prior to compilation can produce optimized tensor operation codes without repeated search and hardware measurements. To generate a sample-efficient training dataset, we extend input representation to include task-specific information and to guide data sampling methods to focus on learning high-performing codes. We evaluated the resulting predictor model, One-Shot Tuner, against AutoTVM and other prior work, and the results show that One-Shot Tuner speeds up compilation by 2.81x to 67.7x compared to prior work while providing comparable or improved inference time for CNN and Transformer models.","PeriodicalId":333281,"journal":{"name":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"One-shot tuner for deep learning compilers\",\"authors\":\"Jaehun Ryu, Eunhyeok Park, Hyojin Sung\",\"doi\":\"10.1145/3497776.3517774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Auto-tuning DL compilers are gaining ground as an optimizing back-end for DL frameworks. While existing work can generate deep learning models that exceed the performance of hand-tuned libraries, they still suffer from prohibitively long auto-tuning time due to repeated hardware measurements in large search spaces. In this paper, we take a neural-predictor inspired approach to reduce the auto-tuning overhead and show that a performance predictor model trained prior to compilation can produce optimized tensor operation codes without repeated search and hardware measurements. To generate a sample-efficient training dataset, we extend input representation to include task-specific information and to guide data sampling methods to focus on learning high-performing codes. We evaluated the resulting predictor model, One-Shot Tuner, against AutoTVM and other prior work, and the results show that One-Shot Tuner speeds up compilation by 2.81x to 67.7x compared to prior work while providing comparable or improved inference time for CNN and Transformer models.\",\"PeriodicalId\":333281,\"journal\":{\"name\":\"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3497776.3517774\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3497776.3517774","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

作为DL框架的优化后端,自动调优DL编译器正在获得越来越多的支持。虽然现有的工作可以生成超过手动调优库性能的深度学习模型,但由于在大型搜索空间中重复进行硬件测量,它们仍然需要花费非常长的自动调优时间。在本文中,我们采用神经预测器启发的方法来减少自动调优开销,并表明在编译之前训练的性能预测器模型可以产生优化的张量操作代码,而无需重复搜索和硬件测量。为了生成一个样本高效的训练数据集,我们扩展了输入表示,以包括特定于任务的信息,并指导数据采样方法专注于学习高性能代码。我们将得到的预测模型One-Shot Tuner与AutoTVM和其他先前的工作进行了评估,结果表明One-Shot Tuner将编译速度提高了2.81倍至67.7倍,同时为CNN和Transformer模型提供了相当或改进的推理时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
One-shot tuner for deep learning compilers
Auto-tuning DL compilers are gaining ground as an optimizing back-end for DL frameworks. While existing work can generate deep learning models that exceed the performance of hand-tuned libraries, they still suffer from prohibitively long auto-tuning time due to repeated hardware measurements in large search spaces. In this paper, we take a neural-predictor inspired approach to reduce the auto-tuning overhead and show that a performance predictor model trained prior to compilation can produce optimized tensor operation codes without repeated search and hardware measurements. To generate a sample-efficient training dataset, we extend input representation to include task-specific information and to guide data sampling methods to focus on learning high-performing codes. We evaluated the resulting predictor model, One-Shot Tuner, against AutoTVM and other prior work, and the results show that One-Shot Tuner speeds up compilation by 2.81x to 67.7x compared to prior work while providing comparable or improved inference time for CNN and Transformer models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信