Enrico Russo, M. Palesi, Davide Patti, Habiba Lahdhiri, Salvatore Monteleone, G. Ascia, V. Catania
{"title":"Combined Application of Approximate Computing Techniques in DNN Hardware Accelerators","authors":"Enrico Russo, M. Palesi, Davide Patti, Habiba Lahdhiri, Salvatore Monteleone, G. Ascia, V. Catania","doi":"10.1109/IPDPSW55747.2022.00013","DOIUrl":null,"url":null,"abstract":"This paper applies Approximate Computing (AC) techniques to the main elements which form a DNN hardware accelerator, namely, computation, communication, and memory subsystems. Specifically, approximate multipliers for computation, link voltage swing reduction for communication, voltage over-scaling for the internal SRAM memory, and lossy compression of the external DRAM memory are considered. The different AC techniques are applied in isolation as well as in conjunction with each other. A set of representative CNN models are mapped onto the approximated hardware accelerators and the trade-offs performance vs. energy vs. accuracy are derived for the execution of CNN inferences.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper applies Approximate Computing (AC) techniques to the main elements which form a DNN hardware accelerator, namely, computation, communication, and memory subsystems. Specifically, approximate multipliers for computation, link voltage swing reduction for communication, voltage over-scaling for the internal SRAM memory, and lossy compression of the external DRAM memory are considered. The different AC techniques are applied in isolation as well as in conjunction with each other. A set of representative CNN models are mapped onto the approximated hardware accelerators and the trade-offs performance vs. energy vs. accuracy are derived for the execution of CNN inferences.