Raccoon: Lightweight Support for Comprehensive Control Flows in Reconfigurable Spatial Architectures

IF 5.6 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

IEEE Transactions on Parallel and Distributed Systems Pub Date : 2025-04-15 DOI:10.1109/TPDS.2025.3561145

Xiangyu Kong;Yi Huang;Longlong Chen;Jianfeng Zhu;Liangwei Li;Xingchen Man;Mingyu Gao;Shaojun Wei;Leibo Liu

{"title":"Raccoon: Lightweight Support for Comprehensive Control Flows in Reconfigurable Spatial Architectures","authors":"Xiangyu Kong;Yi Huang;Longlong Chen;Jianfeng Zhu;Liangwei Li;Xingchen Man;Mingyu Gao;Shaojun Wei;Leibo Liu","doi":"10.1109/TPDS.2025.3561145","DOIUrl":null,"url":null,"abstract":"Coarse-grained reconfigurable arrays (CGRAs) have emerged as promising candidates for digital signal processing, biomedical, and automotive applications, where energy efficiency and flexibility are paramount. Yet existing CGRAs suffer from the Amdahl bottleneck caused by constrained control handling via either off-device communication or expensive tag-matching mechanisms. More importantly, mapping control flow onto CGRAs is extremely arduous and time-consuming due to intricate instruction structures and hardware mechanisms. To counteract these limitations, we propose Raccoon, a portable and lightweight framework for CGRAs targeting vast control flows. Raccoon comprises a comprehensive approach that spans microarchitecture, HW/SW interface, and compiler aspects. Regarding microarchitecture, Raccoon incorporates specialized infrastructure for branch- and loop-level control patterns with concise execution mechanisms. The HW/SW interface of Raccoon includes well-characterized abstractions and instruction sets tailored for easy compilation, featuring custom operators and architectural models for control-oriented units. On the compiler front, Raccoon integrates advanced control handling techniques and employs a portable mapper leveraging reinforcement learning and Monte Carlo tree search. This enables agile mapping and optimization of the entire program, ensuring efficient execution and high-quality results. Through the cohesive co-design, Raccoon can empower various CGRAs with robust control-flow handling capabilities, surpassing conventional tagged mechanisms in terms of hardware efficiency and compiler adaptability. Evaluation results show that Raccoon achieves up to a 5.78× improvement in energy efficiency and a 2.24× reduction in cycle count over state-of-the-art CGRAs. Raccoon stands out for its versatility in managing intricate control flows and showcases remarkable portability across diverse CGRA architectures.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 6","pages":"1294-1310"},"PeriodicalIF":5.6000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Parallel and Distributed Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10965538/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Coarse-grained reconfigurable arrays (CGRAs) have emerged as promising candidates for digital signal processing, biomedical, and automotive applications, where energy efficiency and flexibility are paramount. Yet existing CGRAs suffer from the Amdahl bottleneck caused by constrained control handling via either off-device communication or expensive tag-matching mechanisms. More importantly, mapping control flow onto CGRAs is extremely arduous and time-consuming due to intricate instruction structures and hardware mechanisms. To counteract these limitations, we propose Raccoon, a portable and lightweight framework for CGRAs targeting vast control flows. Raccoon comprises a comprehensive approach that spans microarchitecture, HW/SW interface, and compiler aspects. Regarding microarchitecture, Raccoon incorporates specialized infrastructure for branch- and loop-level control patterns with concise execution mechanisms. The HW/SW interface of Raccoon includes well-characterized abstractions and instruction sets tailored for easy compilation, featuring custom operators and architectural models for control-oriented units. On the compiler front, Raccoon integrates advanced control handling techniques and employs a portable mapper leveraging reinforcement learning and Monte Carlo tree search. This enables agile mapping and optimization of the entire program, ensuring efficient execution and high-quality results. Through the cohesive co-design, Raccoon can empower various CGRAs with robust control-flow handling capabilities, surpassing conventional tagged mechanisms in terms of hardware efficiency and compiler adaptability. Evaluation results show that Raccoon achieves up to a 5.78× improvement in energy efficiency and a 2.24× reduction in cycle count over state-of-the-art CGRAs. Raccoon stands out for its versatility in managing intricate control flows and showcases remarkable portability across diverse CGRA architectures.

查看原文本刊更多论文

浣熊：在可重构空间架构中对综合控制流的轻量级支持

粗粒度可重构阵列（CGRAs）已成为数字信号处理、生物医学和汽车应用的有希望的候选者，在这些应用中，能效和灵活性至关重要。然而，现有的CGRAs受到Amdahl瓶颈的影响，这是由于设备外通信或昂贵的标签匹配机制造成的约束控制处理。更重要的是，由于复杂的指令结构和硬件机制，将控制流映射到CGRAs是非常困难和耗时的。为了克服这些限制，我们提出了Raccoon，这是一个针对大量控制流的便携式轻量级CGRAs框架。Raccoon包含了一个全面的方法，涵盖了微架构、硬件/软件接口和编译器方面。关于微体系结构，Raccoon结合了专门的基础设施，用于分支和循环级别的控制模式，以及简洁的执行机制。浣熊的硬件/软件接口包括特征良好的抽象和指令集，便于编译，为面向控制的单元提供自定义操作符和体系结构模型。在编译器方面，Raccoon集成了先进的控制处理技术，并使用了一个利用强化学习和蒙特卡罗树搜索的便携式映射器。这使得整个程序的敏捷映射和优化成为可能，从而确保高效的执行和高质量的结果。通过内聚协同设计，Raccoon可以为各种CGRAs赋予强大的控制流处理能力，在硬件效率和编译器适应性方面超越传统的标记机制。评估结果显示，与最先进的CGRAs相比，Raccoon在能源效率方面提高了5.78倍，循环次数减少了2.24倍。Raccoon在管理复杂控制流方面的多功能性突出，并展示了跨不同CGRA架构的卓越可移植性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Parallel and Distributed Systems 工程技术-工程：电子与电气

CiteScore

11.00

自引率

9.40%

发文量

281

审稿时长

5.6 months

期刊介绍： IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers. Particular areas of interest include, but are not limited to: a) Parallel and distributed algorithms, focusing on topics such as: models of computation; numerical, combinatorial, and data-intensive parallel algorithms, scalability of algorithms and data structures for parallel and distributed systems, communication and synchronization protocols, network algorithms, scheduling, and load balancing. b) Applications of parallel and distributed computing, including computational and data-enabled science and engineering, big data applications, parallel crowd sourcing, large-scale social network analysis, management of big data, cloud and grid computing, scientific and biomedical applications, mobile computing, and cyber-physical systems. c) Parallel and distributed architectures, including architectures for instruction-level and thread-level parallelism; design, analysis, implementation, fault resilience and performance measurements of multiple-processor systems; multicore processors, heterogeneous many-core systems; petascale and exascale systems designs; novel big data architectures; special purpose architectures, including graphics processors, signal processors, network processors, media accelerators, and other special purpose processors and accelerators; impact of technology on architecture; network and interconnect architectures; parallel I/O and storage systems; architecture of the memory hierarchy; power-efficient and green computing architectures; dependable architectures; and performance modeling and evaluation. d) Parallel and distributed software, including parallel and multicore programming languages and compilers, runtime systems, operating systems, Internet computing and web services, resource management including green computing, middleware for grids, clouds, and data centers, libraries, performance modeling and evaluation, parallel programming paradigms, and programming environments and tools.