Raghav Mehta, Yuyang Huang, Mingxi Cheng, S. Bagga, Nishant Mathur, Ji Li, J. Draper, Shahin Nazarian
{"title":"基于流水线硬件加速和分布式内存的深度神经网络高性能训练","authors":"Raghav Mehta, Yuyang Huang, Mingxi Cheng, S. Bagga, Nishant Mathur, Ji Li, J. Draper, Shahin Nazarian","doi":"10.1109/ISQED.2018.8357317","DOIUrl":null,"url":null,"abstract":"Recently, Deep Neural Networks (DNNs) have made unprecedented progress in various tasks. However, there is a timely need to accelerate the training process in DNNs specifically for real-time applications that demand high performance, energy efficiency and compactness. Numerous algorithms have been proposed to improve the accuracy, however the network training process is computationally slow. In this paper, we present a scalable pipelined hardware architecture with distributed memories for a digital neuron to implement deep neural networks. We also explore various functions and algorithms as well as different memory topologies, to optimize the performance of our training architecture. The power, area, and delay of our proposed model are evaluated with respect to software implementation. Experimental results on the MNIST dataset demonstrate that compared with the software training, our proposed hardware-based approach for training process achieves 33X runtime reduction, 5X power reduction, and nearly 168X energy reduction.","PeriodicalId":213351,"journal":{"name":"2018 19th International Symposium on Quality Electronic Design (ISQED)","volume":"28 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"High performance training of deep neural networks using pipelined hardware acceleration and distributed memory\",\"authors\":\"Raghav Mehta, Yuyang Huang, Mingxi Cheng, S. Bagga, Nishant Mathur, Ji Li, J. Draper, Shahin Nazarian\",\"doi\":\"10.1109/ISQED.2018.8357317\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, Deep Neural Networks (DNNs) have made unprecedented progress in various tasks. However, there is a timely need to accelerate the training process in DNNs specifically for real-time applications that demand high performance, energy efficiency and compactness. Numerous algorithms have been proposed to improve the accuracy, however the network training process is computationally slow. In this paper, we present a scalable pipelined hardware architecture with distributed memories for a digital neuron to implement deep neural networks. We also explore various functions and algorithms as well as different memory topologies, to optimize the performance of our training architecture. The power, area, and delay of our proposed model are evaluated with respect to software implementation. Experimental results on the MNIST dataset demonstrate that compared with the software training, our proposed hardware-based approach for training process achieves 33X runtime reduction, 5X power reduction, and nearly 168X energy reduction.\",\"PeriodicalId\":213351,\"journal\":{\"name\":\"2018 19th International Symposium on Quality Electronic Design (ISQED)\",\"volume\":\"28 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 19th International Symposium on Quality Electronic Design (ISQED)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISQED.2018.8357317\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 19th International Symposium on Quality Electronic Design (ISQED)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED.2018.8357317","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High performance training of deep neural networks using pipelined hardware acceleration and distributed memory
Recently, Deep Neural Networks (DNNs) have made unprecedented progress in various tasks. However, there is a timely need to accelerate the training process in DNNs specifically for real-time applications that demand high performance, energy efficiency and compactness. Numerous algorithms have been proposed to improve the accuracy, however the network training process is computationally slow. In this paper, we present a scalable pipelined hardware architecture with distributed memories for a digital neuron to implement deep neural networks. We also explore various functions and algorithms as well as different memory topologies, to optimize the performance of our training architecture. The power, area, and delay of our proposed model are evaluated with respect to software implementation. Experimental results on the MNIST dataset demonstrate that compared with the software training, our proposed hardware-based approach for training process achieves 33X runtime reduction, 5X power reduction, and nearly 168X energy reduction.