Katherine Mercado, Sathwika Bavikadi, Sai Manoj Pudukotai Dinakarrao
{"title":"Coarse-Grained High-speed Reconfigurable Array-based Approximate Accelerator for Deep Learning Applications","authors":"Katherine Mercado, Sathwika Bavikadi, Sai Manoj Pudukotai Dinakarrao","doi":"10.1109/CISS56502.2023.10089735","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) are widely deployed in various cognitive applications, including computer vision, speech recognition, and image processing. The surpassing accuracy and performance of deep neural networks come at the cost of high computational complexity. Therefore, software implementations of DNNs and Convolutional Neural Networks (CNNs) are often hindered by computational and communication bottlenecks. As a panacea, numerous hardware accelerators have been introduced in recent times to accelerate DNNs and CNNs. Despite effectiveness, the existing hardware accelerators are often confronted by the involved computational complexity and the need for special hardware units to implement each of the DNN/CNN operations. To address such challenges, a reconfigurable DNN/CNN accelerator is proposed in this work. The proposed architecture comprises nine processing elements (PEs) that can perform both convolution and arithmetic operations through run-time reconfiguration and with minimal overhead. To reduce the computational complexity, we employ Mitchell's algorithm, which is supported through low overhead coarse-grained reconfigurability in this work. To facilitate efficient data flow across the PEs, we pre-compute the dataflow paths and configure the dataflow during the run-time. The proposed design is realized on a field-programmable gate array (FPGA) platform for evaluation. The proposed evaluation indicates 1.26x lower resource utilization compared to the state-of-the-art DNN/CNN accelerators and also achieves 99.43% and 82% accuracy on MNIST and CIFAR-10 datasets, respectively.","PeriodicalId":243775,"journal":{"name":"2023 57th Annual Conference on Information Sciences and Systems (CISS)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 57th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS56502.2023.10089735","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deep Neural Networks (DNNs) are widely deployed in various cognitive applications, including computer vision, speech recognition, and image processing. The surpassing accuracy and performance of deep neural networks come at the cost of high computational complexity. Therefore, software implementations of DNNs and Convolutional Neural Networks (CNNs) are often hindered by computational and communication bottlenecks. As a panacea, numerous hardware accelerators have been introduced in recent times to accelerate DNNs and CNNs. Despite effectiveness, the existing hardware accelerators are often confronted by the involved computational complexity and the need for special hardware units to implement each of the DNN/CNN operations. To address such challenges, a reconfigurable DNN/CNN accelerator is proposed in this work. The proposed architecture comprises nine processing elements (PEs) that can perform both convolution and arithmetic operations through run-time reconfiguration and with minimal overhead. To reduce the computational complexity, we employ Mitchell's algorithm, which is supported through low overhead coarse-grained reconfigurability in this work. To facilitate efficient data flow across the PEs, we pre-compute the dataflow paths and configure the dataflow during the run-time. The proposed design is realized on a field-programmable gate array (FPGA) platform for evaluation. The proposed evaluation indicates 1.26x lower resource utilization compared to the state-of-the-art DNN/CNN accelerators and also achieves 99.43% and 82% accuracy on MNIST and CIFAR-10 datasets, respectively.