{"title":"在 FPGA 上生成高能效加速器以实现低级视觉的设计框架","authors":"Zikang Zhou;Xuyang Duan;Jun Han","doi":"10.1109/TVLSI.2024.3409649","DOIUrl":null,"url":null,"abstract":"Low-level vision algorithms play an increasingly crucial role in a wide range of applications, such as biomedical, security, and autopilot. The low-level vision accelerators have also been extensively researched. As low-level vision is often deployed in embedded devices, its accelerators need to achieve high energy efficiency. Meanwhile, the broad application scenarios of low-level vision contribute to its rapid iteration. Designing energy-efficient accelerators for quickly evolving low-level vision algorithms demands substantial effort. Therefore, a design framework specifically tailored for the generation of low-level vision accelerators is urgently needed. In this article, we propose an end-to-end algorithm-hardware generation framework, EffiVision, on field-programmable gate array (FPGA), aimed at generating highly energy-efficient dedicated accelerators for low-level vision neural networks. EffiVision proposes a hardware template that features multiple parallelisms and large architecture exploration spaces specifically designed to accommodate the characteristics of low-level vision networks. Then, it employs activation-weight aware mixed-precision quantization and FPGA-aware NNLUTs to search the suitable hardware parameters within the hardware template, generating highly energy-efficient accelerators tailored for low-level vision networks. We used EffiVision to perform hardware generation for three low-level vision neural networks fast super-resolution convolutional neural network (FSRCNN), denoising convolutional neural network (DnCNN), and demosaicing convolutional neural network (DMCNN) on Xilinx FPGA development boards, achieving the best energy efficiencies of 174.9, 97.8, and 92.7 GOPS/W, respectively. The generated accelerators of FSRCNN and DnCNN are \n<inline-formula> <tex-math>$1.11\\times $ </tex-math></inline-formula>\n and \n<inline-formula> <tex-math>$3.37\\times $ </tex-math></inline-formula>\n more efficient than previous works.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Design Framework for Generating Energy-Efficient Accelerator on FPGA Toward Low-Level Vision\",\"authors\":\"Zikang Zhou;Xuyang Duan;Jun Han\",\"doi\":\"10.1109/TVLSI.2024.3409649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Low-level vision algorithms play an increasingly crucial role in a wide range of applications, such as biomedical, security, and autopilot. The low-level vision accelerators have also been extensively researched. As low-level vision is often deployed in embedded devices, its accelerators need to achieve high energy efficiency. Meanwhile, the broad application scenarios of low-level vision contribute to its rapid iteration. Designing energy-efficient accelerators for quickly evolving low-level vision algorithms demands substantial effort. Therefore, a design framework specifically tailored for the generation of low-level vision accelerators is urgently needed. In this article, we propose an end-to-end algorithm-hardware generation framework, EffiVision, on field-programmable gate array (FPGA), aimed at generating highly energy-efficient dedicated accelerators for low-level vision neural networks. EffiVision proposes a hardware template that features multiple parallelisms and large architecture exploration spaces specifically designed to accommodate the characteristics of low-level vision networks. Then, it employs activation-weight aware mixed-precision quantization and FPGA-aware NNLUTs to search the suitable hardware parameters within the hardware template, generating highly energy-efficient accelerators tailored for low-level vision networks. We used EffiVision to perform hardware generation for three low-level vision neural networks fast super-resolution convolutional neural network (FSRCNN), denoising convolutional neural network (DnCNN), and demosaicing convolutional neural network (DMCNN) on Xilinx FPGA development boards, achieving the best energy efficiencies of 174.9, 97.8, and 92.7 GOPS/W, respectively. The generated accelerators of FSRCNN and DnCNN are \\n<inline-formula> <tex-math>$1.11\\\\times $ </tex-math></inline-formula>\\n and \\n<inline-formula> <tex-math>$3.37\\\\times $ </tex-math></inline-formula>\\n more efficient than previous works.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10559268/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10559268/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A Design Framework for Generating Energy-Efficient Accelerator on FPGA Toward Low-Level Vision
Low-level vision algorithms play an increasingly crucial role in a wide range of applications, such as biomedical, security, and autopilot. The low-level vision accelerators have also been extensively researched. As low-level vision is often deployed in embedded devices, its accelerators need to achieve high energy efficiency. Meanwhile, the broad application scenarios of low-level vision contribute to its rapid iteration. Designing energy-efficient accelerators for quickly evolving low-level vision algorithms demands substantial effort. Therefore, a design framework specifically tailored for the generation of low-level vision accelerators is urgently needed. In this article, we propose an end-to-end algorithm-hardware generation framework, EffiVision, on field-programmable gate array (FPGA), aimed at generating highly energy-efficient dedicated accelerators for low-level vision neural networks. EffiVision proposes a hardware template that features multiple parallelisms and large architecture exploration spaces specifically designed to accommodate the characteristics of low-level vision networks. Then, it employs activation-weight aware mixed-precision quantization and FPGA-aware NNLUTs to search the suitable hardware parameters within the hardware template, generating highly energy-efficient accelerators tailored for low-level vision networks. We used EffiVision to perform hardware generation for three low-level vision neural networks fast super-resolution convolutional neural network (FSRCNN), denoising convolutional neural network (DnCNN), and demosaicing convolutional neural network (DMCNN) on Xilinx FPGA development boards, achieving the best energy efficiencies of 174.9, 97.8, and 92.7 GOPS/W, respectively. The generated accelerators of FSRCNN and DnCNN are
$1.11\times $
and
$3.37\times $
more efficient than previous works.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.