{"title":"基于fpga的人工智能推理的细粒度结构化稀疏计算","authors":"Chen Zhang;Shijie Cao;Guohao Dai;Chenbo Geng;Zhuliang Yao;Wencong Xiao;Yunxin Liu;Ming Wu;Lintao Zhang;Guangyu Sun;Zhigang Ji;Runsheng Wang;Ru Huang","doi":"10.1109/TCAD.2024.3524356","DOIUrl":null,"url":null,"abstract":"With the explosive growth in the number of parameters in deep neural networks (DNNs), sparsity-centric algorithm and hardware designs have become critical for low-latency AI serving systems. However, the inherent randomness in pruning methods often leads to fragmented data access and irregular computation patterns in sparse matrices, resulting in significantly reduced hardware efficiency. Addressing the balance between the ‘randomness’ required to maintain model accuracy and the ‘regularity’ needed for efficient hardware design is crucial for realizing effective sparse computing in AI. This article proposes a fine-grained structured sparsity (FSS) paradigm. The pruned sparse matrices in this paradigm exhibit characteristics of ‘local randomness’ and ‘global regularity’. This dual-feature design allows AI accelerator hardware based on the FSS paradigm to maintain both high model accuracy and efficient hardware design. We implemented this novel accelerator on the Xilinx Alveo U280 and validated our concept across three different AI models, including CNN, RNN, and LLM, demonstrating performance that significantly outperforms prior methods.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2544-2557"},"PeriodicalIF":2.9000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fine-Grained Structured Sparse Computing for FPGA-Based AI Inference\",\"authors\":\"Chen Zhang;Shijie Cao;Guohao Dai;Chenbo Geng;Zhuliang Yao;Wencong Xiao;Yunxin Liu;Ming Wu;Lintao Zhang;Guangyu Sun;Zhigang Ji;Runsheng Wang;Ru Huang\",\"doi\":\"10.1109/TCAD.2024.3524356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the explosive growth in the number of parameters in deep neural networks (DNNs), sparsity-centric algorithm and hardware designs have become critical for low-latency AI serving systems. However, the inherent randomness in pruning methods often leads to fragmented data access and irregular computation patterns in sparse matrices, resulting in significantly reduced hardware efficiency. Addressing the balance between the ‘randomness’ required to maintain model accuracy and the ‘regularity’ needed for efficient hardware design is crucial for realizing effective sparse computing in AI. This article proposes a fine-grained structured sparsity (FSS) paradigm. The pruned sparse matrices in this paradigm exhibit characteristics of ‘local randomness’ and ‘global regularity’. This dual-feature design allows AI accelerator hardware based on the FSS paradigm to maintain both high model accuracy and efficient hardware design. We implemented this novel accelerator on the Xilinx Alveo U280 and validated our concept across three different AI models, including CNN, RNN, and LLM, demonstrating performance that significantly outperforms prior methods.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"44 7\",\"pages\":\"2544-2557\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10818746/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10818746/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Fine-Grained Structured Sparse Computing for FPGA-Based AI Inference
With the explosive growth in the number of parameters in deep neural networks (DNNs), sparsity-centric algorithm and hardware designs have become critical for low-latency AI serving systems. However, the inherent randomness in pruning methods often leads to fragmented data access and irregular computation patterns in sparse matrices, resulting in significantly reduced hardware efficiency. Addressing the balance between the ‘randomness’ required to maintain model accuracy and the ‘regularity’ needed for efficient hardware design is crucial for realizing effective sparse computing in AI. This article proposes a fine-grained structured sparsity (FSS) paradigm. The pruned sparse matrices in this paradigm exhibit characteristics of ‘local randomness’ and ‘global regularity’. This dual-feature design allows AI accelerator hardware based on the FSS paradigm to maintain both high model accuracy and efficient hardware design. We implemented this novel accelerator on the Xilinx Alveo U280 and validated our concept across three different AI models, including CNN, RNN, and LLM, demonstrating performance that significantly outperforms prior methods.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.