G. Santoro, M. Casu, Valentino Peluso, A. Calimera, M. Alioto
{"title":"Design-Space Exploration of Pareto-Optimal Architectures for Deep Learning with DVFS","authors":"G. Santoro, M. Casu, Valentino Peluso, A. Calimera, M. Alioto","doi":"10.1109/ISCAS.2018.8351685","DOIUrl":null,"url":null,"abstract":"Specialized computing engines are required to accelerate the execution of Deep Learning (DL) algorithms in an energy-efficient way. To adapt the processing throughput of these accelerators to the workload requirements while saving power, Dynamic Voltage and Frequency Scaling (DVFS) seems the natural solution. However, DL workloads need to frequently access the off-chip memory, which tends to make the performance of these accelerators memory-bound rather than computation-bound, hence reducing the effectiveness of DVFS. In this work we use a performance-power analytical model fitted on a parametrized implementation of a DL accelerator in a 28-nm FDSOI technology to explore a large design space and to obtain the Pareto points that maximize the effectiveness of DVFS in the sub-space of throughput and energy efficiency. In our model we consider the impact on performance and power of the off-chip memory using real data of a commercial low-power DRAM.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"176 1","pages":"1-5"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAS.2018.8351685","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Specialized computing engines are required to accelerate the execution of Deep Learning (DL) algorithms in an energy-efficient way. To adapt the processing throughput of these accelerators to the workload requirements while saving power, Dynamic Voltage and Frequency Scaling (DVFS) seems the natural solution. However, DL workloads need to frequently access the off-chip memory, which tends to make the performance of these accelerators memory-bound rather than computation-bound, hence reducing the effectiveness of DVFS. In this work we use a performance-power analytical model fitted on a parametrized implementation of a DL accelerator in a 28-nm FDSOI technology to explore a large design space and to obtain the Pareto points that maximize the effectiveness of DVFS in the sub-space of throughput and energy efficiency. In our model we consider the impact on performance and power of the off-chip memory using real data of a commercial low-power DRAM.