{"title":"语义分割模型中稀疏冗余计算的硬件加速器高强度解决方案","authors":"Jiahui Huang;Zhan Li;Yuxian Jiang;Zhihan Zhang;Hao Wang;Sheng Chang","doi":"10.1109/TC.2025.3585354","DOIUrl":null,"url":null,"abstract":"The rapid development of artificial intelligence (AI) has met people's personalized needs. However, with the increase of data capacities and computing requirements, the imbalance between large-scale data transmission and limited network bandwidth has become increasingly prominent. To improve the speed of embedded system, real-time intelligent computing is gradually moving from the cloud to the edge. Traditional FPGA-based AI accelerators mainly utilize PE architecture, but the low computing throughput and resource utilization make it difficult to meet the power requirement of edge AI application scenarios such as image segmentation. In recent years, AI accelerators based on streaming architecture have become a trend, and it is necessary to customize high-performance streaming accelerators for specific segmentation algorithms. In this paper, we design a high-intensity pixel-level fully pipelined accelerator with customized strategies to eliminate the sparse and redundant computations in specific algorithms of semantic segmentation, which significantly improve the accelerator's computing throughput and hardware resources utilization. On Xilinx FPGA, our acceleration of two typical semantic segmentation networks-ESPNet and DeepLabV3, achieves optimized throughputs of 171.3 GOPS and 1324.8 GOPS, and computing efficiency of 9.26 and 9.01, respectively. It provides the possibility of hardware deployment in real-time application with high computing intensity.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"3129-3142"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A High-Intensity Solution of Hardware Accelerator for Sparse and Redundant Computations in Semantic Segmentation Models\",\"authors\":\"Jiahui Huang;Zhan Li;Yuxian Jiang;Zhihan Zhang;Hao Wang;Sheng Chang\",\"doi\":\"10.1109/TC.2025.3585354\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid development of artificial intelligence (AI) has met people's personalized needs. However, with the increase of data capacities and computing requirements, the imbalance between large-scale data transmission and limited network bandwidth has become increasingly prominent. To improve the speed of embedded system, real-time intelligent computing is gradually moving from the cloud to the edge. Traditional FPGA-based AI accelerators mainly utilize PE architecture, but the low computing throughput and resource utilization make it difficult to meet the power requirement of edge AI application scenarios such as image segmentation. In recent years, AI accelerators based on streaming architecture have become a trend, and it is necessary to customize high-performance streaming accelerators for specific segmentation algorithms. In this paper, we design a high-intensity pixel-level fully pipelined accelerator with customized strategies to eliminate the sparse and redundant computations in specific algorithms of semantic segmentation, which significantly improve the accelerator's computing throughput and hardware resources utilization. On Xilinx FPGA, our acceleration of two typical semantic segmentation networks-ESPNet and DeepLabV3, achieves optimized throughputs of 171.3 GOPS and 1324.8 GOPS, and computing efficiency of 9.26 and 9.01, respectively. It provides the possibility of hardware deployment in real-time application with high computing intensity.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 9\",\"pages\":\"3129-3142\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11062861/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11062861/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A High-Intensity Solution of Hardware Accelerator for Sparse and Redundant Computations in Semantic Segmentation Models
The rapid development of artificial intelligence (AI) has met people's personalized needs. However, with the increase of data capacities and computing requirements, the imbalance between large-scale data transmission and limited network bandwidth has become increasingly prominent. To improve the speed of embedded system, real-time intelligent computing is gradually moving from the cloud to the edge. Traditional FPGA-based AI accelerators mainly utilize PE architecture, but the low computing throughput and resource utilization make it difficult to meet the power requirement of edge AI application scenarios such as image segmentation. In recent years, AI accelerators based on streaming architecture have become a trend, and it is necessary to customize high-performance streaming accelerators for specific segmentation algorithms. In this paper, we design a high-intensity pixel-level fully pipelined accelerator with customized strategies to eliminate the sparse and redundant computations in specific algorithms of semantic segmentation, which significantly improve the accelerator's computing throughput and hardware resources utilization. On Xilinx FPGA, our acceleration of two typical semantic segmentation networks-ESPNet and DeepLabV3, achieves optimized throughputs of 171.3 GOPS and 1324.8 GOPS, and computing efficiency of 9.26 and 9.01, respectively. It provides the possibility of hardware deployment in real-time application with high computing intensity.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.