G. Santoro, M. Casu, Valentino Peluso, A. Calimera, M. Alioto
{"title":"基于DVFS的深度学习pareto最优架构的设计空间探索","authors":"G. Santoro, M. Casu, Valentino Peluso, A. Calimera, M. Alioto","doi":"10.1109/ISCAS.2018.8351685","DOIUrl":null,"url":null,"abstract":"Specialized computing engines are required to accelerate the execution of Deep Learning (DL) algorithms in an energy-efficient way. To adapt the processing throughput of these accelerators to the workload requirements while saving power, Dynamic Voltage and Frequency Scaling (DVFS) seems the natural solution. However, DL workloads need to frequently access the off-chip memory, which tends to make the performance of these accelerators memory-bound rather than computation-bound, hence reducing the effectiveness of DVFS. In this work we use a performance-power analytical model fitted on a parametrized implementation of a DL accelerator in a 28-nm FDSOI technology to explore a large design space and to obtain the Pareto points that maximize the effectiveness of DVFS in the sub-space of throughput and energy efficiency. In our model we consider the impact on performance and power of the off-chip memory using real data of a commercial low-power DRAM.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"176 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Design-Space Exploration of Pareto-Optimal Architectures for Deep Learning with DVFS\",\"authors\":\"G. Santoro, M. Casu, Valentino Peluso, A. Calimera, M. Alioto\",\"doi\":\"10.1109/ISCAS.2018.8351685\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Specialized computing engines are required to accelerate the execution of Deep Learning (DL) algorithms in an energy-efficient way. To adapt the processing throughput of these accelerators to the workload requirements while saving power, Dynamic Voltage and Frequency Scaling (DVFS) seems the natural solution. However, DL workloads need to frequently access the off-chip memory, which tends to make the performance of these accelerators memory-bound rather than computation-bound, hence reducing the effectiveness of DVFS. In this work we use a performance-power analytical model fitted on a parametrized implementation of a DL accelerator in a 28-nm FDSOI technology to explore a large design space and to obtain the Pareto points that maximize the effectiveness of DVFS in the sub-space of throughput and energy efficiency. In our model we consider the impact on performance and power of the off-chip memory using real data of a commercial low-power DRAM.\",\"PeriodicalId\":6569,\"journal\":{\"name\":\"2018 IEEE International Symposium on Circuits and Systems (ISCAS)\",\"volume\":\"176 1\",\"pages\":\"1-5\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Symposium on Circuits and Systems (ISCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCAS.2018.8351685\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAS.2018.8351685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Design-Space Exploration of Pareto-Optimal Architectures for Deep Learning with DVFS
Specialized computing engines are required to accelerate the execution of Deep Learning (DL) algorithms in an energy-efficient way. To adapt the processing throughput of these accelerators to the workload requirements while saving power, Dynamic Voltage and Frequency Scaling (DVFS) seems the natural solution. However, DL workloads need to frequently access the off-chip memory, which tends to make the performance of these accelerators memory-bound rather than computation-bound, hence reducing the effectiveness of DVFS. In this work we use a performance-power analytical model fitted on a parametrized implementation of a DL accelerator in a 28-nm FDSOI technology to explore a large design space and to obtain the Pareto points that maximize the effectiveness of DVFS in the sub-space of throughput and energy efficiency. In our model we consider the impact on performance and power of the off-chip memory using real data of a commercial low-power DRAM.