{"title":"Exploiting Near-Memory Processing Architectures for Bayesian Neural Networks Acceleration","authors":"Yinglin Zhao, Jianlei Yang, Xiaotao Jia, Xueyan Wang, Zhaohao Wang, W. Kang, Youguang Zhang, Weisheng Zhao","doi":"10.1109/ISVLSI.2019.00045","DOIUrl":null,"url":null,"abstract":"Bayesian inference is an effective approach to capture the model uncertainty as well as tackle the over-fitting problem in deep neural networks. Recently Bayesian neural networks (BNNs) are becoming more and more popular and have succeeded in many recognition tasks. However, the BNNs inference procedure requires numerous memory access operations due to the resulted sampling networks. In this paper, a near memory architecture is proposed for accelerating BNN inference by introducing additional memory units near the processing units. The near memory architecture could cache the frequently accessed data to reduce the data movement efficiently. Minimizing the expensive data movements between memory units and computation units contributes to cutting down the latency and energy consumption. Comparing with the traditional approach, the simulation results show that the proposed architecture reduces the energy consumption by 9% and achieves a 1:6 speedup at the cost of 4% area overhead.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"115 1","pages":"203-206"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2019.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Bayesian inference is an effective approach to capture the model uncertainty as well as tackle the over-fitting problem in deep neural networks. Recently Bayesian neural networks (BNNs) are becoming more and more popular and have succeeded in many recognition tasks. However, the BNNs inference procedure requires numerous memory access operations due to the resulted sampling networks. In this paper, a near memory architecture is proposed for accelerating BNN inference by introducing additional memory units near the processing units. The near memory architecture could cache the frequently accessed data to reduce the data movement efficiently. Minimizing the expensive data movements between memory units and computation units contributes to cutting down the latency and energy consumption. Comparing with the traditional approach, the simulation results show that the proposed architecture reduces the energy consumption by 9% and achieves a 1:6 speedup at the cost of 4% area overhead.