适用于紧凑型异构 CNN 的高效混合深度学习加速器

IF 1.8 3区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Architecture and Code Optimization Pub Date : 2024-01-08 DOI:10.1145/3639823

Fareed Qararyah, Muhammad Waqar Azhar, Pedro Trancoso

{"title":"适用于紧凑型异构 CNN 的高效混合深度学习加速器","authors":"Fareed Qararyah, Muhammad Waqar Azhar, Pedro Trancoso","doi":"10.1145/3639823","DOIUrl":null,"url":null,"abstract":"Resource-efficient Convolutional Neural Networks (CNNs) are gaining more attention. These CNNs have relatively low computational and memory requirements. A common denominator among such CNNs is having more heterogeneity than traditional CNNs. This heterogeneity is present at two levels: intra-layer-type and inter-layer-type. Generic accelerators do not capture these levels of heterogeneity, which harms their efficiency. Consequently, researchers have proposed model-specific accelerators with dedicated engines. When designing an accelerator with dedicated engines, one option is to dedicate an engine per CNN layer. We refer to accelerators designed with this approach as single-engine single-layer (SESL). This approach enables optimizing each engine for its specific layer. However, such accelerators are resource-demanding and unscalable. Another option is to design a minimal number of dedicated engines such that each engine handles all layers of one type. We refer to these accelerators as single-engine multiple-layer (SEML). single-engine multiple-layer accelerators capture the inter-layer-type, but not the intra-layer-type heterogeneity. We propose FiBHA (Fixed Budget Hybrid CNN Accelerator), a hybrid accelerator composed of a single-engine single-layer Layer part and a single-engine multiple-layer part, each processing a subset of CNN layers. FiBHA captures more heterogeneity than single-engine multiple-layer while being more resource-aware and scalable than single-engine single-layer. Moreover, we propose a novel module, Fused Inverted Residual Bottleneck (FIRB), a fine-grained and memory-light single-engine single-layer architecture building block. The proposed architecture is implemented and evaluated using high-level synthesis (HLS) on different FPGAs representing various resource budgets. Our evaluation shows that FiBHA improves the throughput by up to 4x and 2.5x compared to state-of-the-art single-engine single-layer and single-engine multiple-layer accelerators, respectively. Moreover, FiBHA reduces memory and energy consumption compared to a single-engine multiple-layer accelerator. The evaluation also shows that FIRB reduces the required memory by up to \\(54\\% \\), and energy requirements by up to \\(35\\% \\) compared to traditional pipelining.","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"27 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs\",\"authors\":\"Fareed Qararyah, Muhammad Waqar Azhar, Pedro Trancoso\",\"doi\":\"10.1145/3639823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Resource-efficient Convolutional Neural Networks (CNNs) are gaining more attention. These CNNs have relatively low computational and memory requirements. A common denominator among such CNNs is having more heterogeneity than traditional CNNs. This heterogeneity is present at two levels: intra-layer-type and inter-layer-type. Generic accelerators do not capture these levels of heterogeneity, which harms their efficiency. Consequently, researchers have proposed model-specific accelerators with dedicated engines. When designing an accelerator with dedicated engines, one option is to dedicate an engine per CNN layer. We refer to accelerators designed with this approach as single-engine single-layer (SESL). This approach enables optimizing each engine for its specific layer. However, such accelerators are resource-demanding and unscalable. Another option is to design a minimal number of dedicated engines such that each engine handles all layers of one type. We refer to these accelerators as single-engine multiple-layer (SEML). single-engine multiple-layer accelerators capture the inter-layer-type, but not the intra-layer-type heterogeneity. We propose FiBHA (Fixed Budget Hybrid CNN Accelerator), a hybrid accelerator composed of a single-engine single-layer Layer part and a single-engine multiple-layer part, each processing a subset of CNN layers. FiBHA captures more heterogeneity than single-engine multiple-layer while being more resource-aware and scalable than single-engine single-layer. Moreover, we propose a novel module, Fused Inverted Residual Bottleneck (FIRB), a fine-grained and memory-light single-engine single-layer architecture building block. The proposed architecture is implemented and evaluated using high-level synthesis (HLS) on different FPGAs representing various resource budgets. Our evaluation shows that FiBHA improves the throughput by up to 4x and 2.5x compared to state-of-the-art single-engine single-layer and single-engine multiple-layer accelerators, respectively. Moreover, FiBHA reduces memory and energy consumption compared to a single-engine multiple-layer accelerator. The evaluation also shows that FIRB reduces the required memory by up to \\\\(54\\\\% \\\\), and energy requirements by up to \\\\(35\\\\% \\\\) compared to traditional pipelining.\",\"PeriodicalId\":50920,\"journal\":{\"name\":\"ACM Transactions on Architecture and Code Optimization\",\"volume\":\"27 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Architecture and Code Optimization\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3639823\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3639823","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

资源节约型卷积神经网络（CNN）越来越受到关注。这些 CNN 对计算和内存的要求相对较低。这类 CNN 的一个共同点是比传统 CNN 具有更多的异质性。这种异构存在于两个层面：层内类型和层间类型。通用加速器无法捕捉到这些层次的异构性，从而降低了效率。因此，研究人员提出了具有专用引擎的特定模型加速器。在设计带有专用引擎的加速器时，一种选择是为每个 CNN 层专用一个引擎。我们将采用这种方法设计的加速器称为单引擎单层（SESL）。这种方法可以针对特定层优化每个引擎。不过，这种加速器对资源要求较高，而且无法扩展。另一种方法是设计最少数量的专用引擎，让每个引擎处理一种类型的所有层。我们将这些加速器称为单引擎多层（SEML）。单引擎多层加速器能捕捉层间类型的异质性，但不能捕捉层内类型的异质性。我们提出了 FiBHA（固定预算混合 CNN 加速器），这是一种混合加速器，由一个单引擎单层部分和一个单引擎多层部分组成，每个部分处理 CNN 层的一个子集。FiBHA 比单引擎多层加速器能捕捉更多异构性，同时比单引擎单层加速器更具资源感知能力和可扩展性。此外，我们还提出了一个新模块--融合反转残余瓶颈（FIRB），这是一个细粒度、轻内存的单引擎单层架构构件。我们在代表不同资源预算的不同 FPGA 上使用高级综合（HLS）实现并评估了所提出的架构。评估结果表明，与最先进的单引擎单层加速器和单引擎多层加速器相比，FiBHA 的吞吐量分别提高了 4 倍和 2.5 倍。此外，与单引擎多层加速器相比，FiBHA 还降低了内存和能耗。评估还表明，与传统流水线相比，FIRB最多可减少54%的内存，最多可减少35%的能耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs

Resource-efficient Convolutional Neural Networks (CNNs) are gaining more attention. These CNNs have relatively low computational and memory requirements. A common denominator among such CNNs is having more heterogeneity than traditional CNNs. This heterogeneity is present at two levels: intra-layer-type and inter-layer-type. Generic accelerators do not capture these levels of heterogeneity, which harms their efficiency. Consequently, researchers have proposed model-specific accelerators with dedicated engines. When designing an accelerator with dedicated engines, one option is to dedicate an engine per CNN layer. We refer to accelerators designed with this approach as single-engine single-layer (SESL). This approach enables optimizing each engine for its specific layer. However, such accelerators are resource-demanding and unscalable. Another option is to design a minimal number of dedicated engines such that each engine handles all layers of one type. We refer to these accelerators as single-engine multiple-layer (SEML). single-engine multiple-layer accelerators capture the inter-layer-type, but not the intra-layer-type heterogeneity.

We propose FiBHA (Fixed Budget Hybrid CNN Accelerator), a hybrid accelerator composed of a single-engine single-layer Layer part and a single-engine multiple-layer part, each processing a subset of CNN layers. FiBHA captures more heterogeneity than single-engine multiple-layer while being more resource-aware and scalable than single-engine single-layer. Moreover, we propose a novel module, Fused Inverted Residual Bottleneck (FIRB), a fine-grained and memory-light single-engine single-layer architecture building block. The proposed architecture is implemented and evaluated using high-level synthesis (HLS) on different FPGAs representing various resource budgets. Our evaluation shows that FiBHA improves the throughput by up to 4x and 2.5x compared to state-of-the-art single-engine single-layer and single-engine multiple-layer accelerators, respectively. Moreover, FiBHA reduces memory and energy consumption compared to a single-engine multiple-layer accelerator. The evaluation also shows that FIRB reduces the required memory by up to \(54\% \), and energy requirements by up to \(35\% \) compared to traditional pipelining.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Architecture and Code Optimization 工程技术-计算机：理论方法

CiteScore

3.60

自引率

6.20%

发文量

审稿时长

6-12 weeks

期刊介绍： ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.