Ignacio Pérez, Wladimir E. Valenzuela, M. Figueroa
{"title":"A Hardware Accelerator for Edge Detection in High-Definition Video using Cellular Neural Networks","authors":"Ignacio Pérez, Wladimir E. Valenzuela, M. Figueroa","doi":"10.1109/DSD.2019.00017","DOIUrl":null,"url":null,"abstract":"This paper presents the architecture of a hardware accelerator for a cellular neural network (CeNN) with an application to real-time edge detection on visible-range and infrared video. The accelerator features fully-pipelined processing elements (PEs) that exploit the data parallelism in the algorithm to perform an iteration of the CeNN on a stream of video data with high throughput. The memory architecture exploits the locality of reference in the CeNN, so that each PE uses only 5 line buffers to store pixel, state, and output data, thus achieving low on-chip memory utilization. Implemented on a Xilinx XC7A200T FPGA running at 245MHz, the accelerator performs edge detection on 1080p video using a single CeNN iteration with a throughput of 118 frames per second (fps), a total latency of 15.7us, and 618mW of power consumption. The architecture features static reconfiguration to store built-in kernels and to add more PEs to support multiple iterations of the CeNN algorithm. More kernels can be added dynamically through a serial interface.","PeriodicalId":217233,"journal":{"name":"2019 22nd Euromicro Conference on Digital System Design (DSD)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2019.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents the architecture of a hardware accelerator for a cellular neural network (CeNN) with an application to real-time edge detection on visible-range and infrared video. The accelerator features fully-pipelined processing elements (PEs) that exploit the data parallelism in the algorithm to perform an iteration of the CeNN on a stream of video data with high throughput. The memory architecture exploits the locality of reference in the CeNN, so that each PE uses only 5 line buffers to store pixel, state, and output data, thus achieving low on-chip memory utilization. Implemented on a Xilinx XC7A200T FPGA running at 245MHz, the accelerator performs edge detection on 1080p video using a single CeNN iteration with a throughput of 118 frames per second (fps), a total latency of 15.7us, and 618mW of power consumption. The architecture features static reconfiguration to store built-in kernels and to add more PEs to support multiple iterations of the CeNN algorithm. More kernels can be added dynamically through a serial interface.