Rongdi Sun, Peilin Liu, Cecil Accetti, A. Naqvi, Haroon Ahmed, J. Qian
{"title":"基于974GOPS/W的二元权重网络加速多级并行架构","authors":"Rongdi Sun, Peilin Liu, Cecil Accetti, A. Naqvi, Haroon Ahmed, J. Qian","doi":"10.1109/ISCAS.2018.8351247","DOIUrl":null,"url":null,"abstract":"Deep neural networks dominate in the machine learning field. However, deploying deep neural networks on mobile devices requires aggressive compression of models due to huge amounts of parameters. An extreme case is to restrict weights to binary values {+1/−1} without much loss of accuracy. This promising method not only reduces hardware overhead of memory and computation, but also improves the performance of network inference. In this work, a flexible architecture for binary weight network acceleration is proposed. The architecture fully exploits the inherent multi-level parallelism of neural networks, resulting in utilization of processing elements over 80% for different layers. In addition, we present efficient data placement and transmission methods in coordination with multi-level parallel processing. The accelerator is implemented using SMIC 40nm technology. It operates at 1.2V and achieves up to 974GOPS/W power efficiency.","PeriodicalId":6569,"journal":{"name":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","volume":"130 1","pages":"1-4"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A 974GOPS/W Multi-level Parallel Architecture for Binary Weight Network Acceleration\",\"authors\":\"Rongdi Sun, Peilin Liu, Cecil Accetti, A. Naqvi, Haroon Ahmed, J. Qian\",\"doi\":\"10.1109/ISCAS.2018.8351247\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks dominate in the machine learning field. However, deploying deep neural networks on mobile devices requires aggressive compression of models due to huge amounts of parameters. An extreme case is to restrict weights to binary values {+1/−1} without much loss of accuracy. This promising method not only reduces hardware overhead of memory and computation, but also improves the performance of network inference. In this work, a flexible architecture for binary weight network acceleration is proposed. The architecture fully exploits the inherent multi-level parallelism of neural networks, resulting in utilization of processing elements over 80% for different layers. In addition, we present efficient data placement and transmission methods in coordination with multi-level parallel processing. The accelerator is implemented using SMIC 40nm technology. It operates at 1.2V and achieves up to 974GOPS/W power efficiency.\",\"PeriodicalId\":6569,\"journal\":{\"name\":\"2018 IEEE International Symposium on Circuits and Systems (ISCAS)\",\"volume\":\"130 1\",\"pages\":\"1-4\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Symposium on Circuits and Systems (ISCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCAS.2018.8351247\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Symposium on Circuits and Systems (ISCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCAS.2018.8351247","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A 974GOPS/W Multi-level Parallel Architecture for Binary Weight Network Acceleration
Deep neural networks dominate in the machine learning field. However, deploying deep neural networks on mobile devices requires aggressive compression of models due to huge amounts of parameters. An extreme case is to restrict weights to binary values {+1/−1} without much loss of accuracy. This promising method not only reduces hardware overhead of memory and computation, but also improves the performance of network inference. In this work, a flexible architecture for binary weight network acceleration is proposed. The architecture fully exploits the inherent multi-level parallelism of neural networks, resulting in utilization of processing elements over 80% for different layers. In addition, we present efficient data placement and transmission methods in coordination with multi-level parallel processing. The accelerator is implemented using SMIC 40nm technology. It operates at 1.2V and achieves up to 974GOPS/W power efficiency.