{"title":"基于强化学习的DNN模型到加速器的高效映射","authors":"Shine Parekkadan Sunny, Satyajit Das","doi":"10.1109/coolchips54332.2022.9772673","DOIUrl":null,"url":null,"abstract":"The input tensors in each layer of Deep Neural Network (DNN) models are often partitioned/tiled to get accommodated in the limited on-chip memory of accelerators. Studies show that efficient tiling schedules (commonly referred to as mapping) for a given accelerator and DNN model reduce the data movement between the accelerator and different levels of the memory hierarchy improving the performance. However, finding layer-wise optimum mapping for a target architecture with a given energy and latency envelope is an open problem due to the huge search space in the mappings. In this paper, we propose a Reinforcement Learning (RL) based automated mapping approach to find optimum schedules of DNN layers for a given architecture model without violating the specified energy and latency constraints. The learned policies easily adapt to a wide range of DNN models with different hardware configurations, facilitating transfer learning improving the training time. Experiments show that the proposed work improves latency and energy consumption by an average of 21.5% and 15.6% respectively compared to the state-of-the-art genetic algorithm-based GAMMA approach for a wide range of DNN models running on NVIDIA Deep Learning Accelerator (NVDLA). The training time of RL-based transfer learning is 15× faster than that of GAMMA.","PeriodicalId":266152,"journal":{"name":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement Learning based Efficient Mapping of DNN Models onto Accelerators\",\"authors\":\"Shine Parekkadan Sunny, Satyajit Das\",\"doi\":\"10.1109/coolchips54332.2022.9772673\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The input tensors in each layer of Deep Neural Network (DNN) models are often partitioned/tiled to get accommodated in the limited on-chip memory of accelerators. Studies show that efficient tiling schedules (commonly referred to as mapping) for a given accelerator and DNN model reduce the data movement between the accelerator and different levels of the memory hierarchy improving the performance. However, finding layer-wise optimum mapping for a target architecture with a given energy and latency envelope is an open problem due to the huge search space in the mappings. In this paper, we propose a Reinforcement Learning (RL) based automated mapping approach to find optimum schedules of DNN layers for a given architecture model without violating the specified energy and latency constraints. The learned policies easily adapt to a wide range of DNN models with different hardware configurations, facilitating transfer learning improving the training time. Experiments show that the proposed work improves latency and energy consumption by an average of 21.5% and 15.6% respectively compared to the state-of-the-art genetic algorithm-based GAMMA approach for a wide range of DNN models running on NVIDIA Deep Learning Accelerator (NVDLA). The training time of RL-based transfer learning is 15× faster than that of GAMMA.\",\"PeriodicalId\":266152,\"journal\":{\"name\":\"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/coolchips54332.2022.9772673\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/coolchips54332.2022.9772673","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reinforcement Learning based Efficient Mapping of DNN Models onto Accelerators
The input tensors in each layer of Deep Neural Network (DNN) models are often partitioned/tiled to get accommodated in the limited on-chip memory of accelerators. Studies show that efficient tiling schedules (commonly referred to as mapping) for a given accelerator and DNN model reduce the data movement between the accelerator and different levels of the memory hierarchy improving the performance. However, finding layer-wise optimum mapping for a target architecture with a given energy and latency envelope is an open problem due to the huge search space in the mappings. In this paper, we propose a Reinforcement Learning (RL) based automated mapping approach to find optimum schedules of DNN layers for a given architecture model without violating the specified energy and latency constraints. The learned policies easily adapt to a wide range of DNN models with different hardware configurations, facilitating transfer learning improving the training time. Experiments show that the proposed work improves latency and energy consumption by an average of 21.5% and 15.6% respectively compared to the state-of-the-art genetic algorithm-based GAMMA approach for a wide range of DNN models running on NVIDIA Deep Learning Accelerator (NVDLA). The training time of RL-based transfer learning is 15× faster than that of GAMMA.