Qingyu Guo;Haoyang Luo;Meng Li;Xiyuan Tang;Yuan Wang
{"title":"CASCADE:利用串联和刷新数据流的 CNN 加速器合成框架","authors":"Qingyu Guo;Haoyang Luo;Meng Li;Xiyuan Tang;Yuan Wang","doi":"10.1109/TCSI.2024.3452954","DOIUrl":null,"url":null,"abstract":"Layer Pipeline (LP) represents an innovative architecture for neural network accelerators, which implements task-level pipelining at the granularity of layers. Despite improvements in throughput, LP architectures face challenges due to complicated dataflow design, intricate design space and high resource requirements. In this paper, we introduce an accelerator synthesis framework, CASCADE. CASCADE leverages a novel dataflow, CARD, to efficiently manage convolutional operations’ irregular memory access patterns using simplified logic and minimal buffers. It also employs advanced design space exploration methods to optimize unrolling parallelism and FIFO depth settings automatically for each layer. Finally, to further enhance resource efficiency, CASCADE leverages Lookup Table-based multiplication and accumulation units. With extensive experimental results, we demonstrate that CASCADE significantly outperforms existing works, achieving a \n<inline-formula> <tex-math>$3\\times $ </tex-math></inline-formula>\n improvement in resource efficiency and a \n<inline-formula> <tex-math>$4\\times $ </tex-math></inline-formula>\n improvement in power efficiency. It achieves over \n<inline-formula> <tex-math>$1.5\\times 10^{4}$ </tex-math></inline-formula>\n frames per second throughput and 71.9% accuracy on ImageNet.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":null,"pages":null},"PeriodicalIF":5.2000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CASCADE: A Framework for CNN Accelerator Synthesis With Concatenation and Refreshing Dataflow\",\"authors\":\"Qingyu Guo;Haoyang Luo;Meng Li;Xiyuan Tang;Yuan Wang\",\"doi\":\"10.1109/TCSI.2024.3452954\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Layer Pipeline (LP) represents an innovative architecture for neural network accelerators, which implements task-level pipelining at the granularity of layers. Despite improvements in throughput, LP architectures face challenges due to complicated dataflow design, intricate design space and high resource requirements. In this paper, we introduce an accelerator synthesis framework, CASCADE. CASCADE leverages a novel dataflow, CARD, to efficiently manage convolutional operations’ irregular memory access patterns using simplified logic and minimal buffers. It also employs advanced design space exploration methods to optimize unrolling parallelism and FIFO depth settings automatically for each layer. Finally, to further enhance resource efficiency, CASCADE leverages Lookup Table-based multiplication and accumulation units. With extensive experimental results, we demonstrate that CASCADE significantly outperforms existing works, achieving a \\n<inline-formula> <tex-math>$3\\\\times $ </tex-math></inline-formula>\\n improvement in resource efficiency and a \\n<inline-formula> <tex-math>$4\\\\times $ </tex-math></inline-formula>\\n improvement in power efficiency. It achieves over \\n<inline-formula> <tex-math>$1.5\\\\times 10^{4}$ </tex-math></inline-formula>\\n frames per second throughput and 71.9% accuracy on ImageNet.\",\"PeriodicalId\":13039,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2024-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10701568/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10701568/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
CASCADE: A Framework for CNN Accelerator Synthesis With Concatenation and Refreshing Dataflow
Layer Pipeline (LP) represents an innovative architecture for neural network accelerators, which implements task-level pipelining at the granularity of layers. Despite improvements in throughput, LP architectures face challenges due to complicated dataflow design, intricate design space and high resource requirements. In this paper, we introduce an accelerator synthesis framework, CASCADE. CASCADE leverages a novel dataflow, CARD, to efficiently manage convolutional operations’ irregular memory access patterns using simplified logic and minimal buffers. It also employs advanced design space exploration methods to optimize unrolling parallelism and FIFO depth settings automatically for each layer. Finally, to further enhance resource efficiency, CASCADE leverages Lookup Table-based multiplication and accumulation units. With extensive experimental results, we demonstrate that CASCADE significantly outperforms existing works, achieving a
$3\times $
improvement in resource efficiency and a
$4\times $
improvement in power efficiency. It achieves over
$1.5\times 10^{4}$
frames per second throughput and 71.9% accuracy on ImageNet.
期刊介绍:
TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.