{"title":"面向高效不规则循环处理的融合粒度可重构架构协同设计框架","authors":"Yuan Dai;Xuchen Gao;Yunhui Qiu;Jingyuan Li;Yuhang Cao;Yiqing Mao;Sichao Chen;Wenbo Yin;Wai-Shing Luk;Lingli Wang","doi":"10.1109/TC.2025.3585345","DOIUrl":null,"url":null,"abstract":"Coarse-Grained Reconfigurable Architecture (CGRA) emerges as a competitive accelerator due to its high flexibility and energy efficiency. However, most CGRAs are effective for computation-intensive applications with regular loops but struggle with irregular loops containing control flows. These loops introduce fine-grained logic operations and are costly to execute by coarse-grained arithmetic units in CGRA. Efficiently handling such logic operations necessitates incorporating Boolean algebra optimization, which can improve logic density and reduce logic depth. Unfortunately, no previous research has incorporated it into the compilation flow to support irregular loops efficiently. We propose <i>COFFA</i>, an open-source framework for heterogeneous architecture with a RISC-V CPU and a fused-grained reconfigurable accelerator, which integrates coarse-grained arithmetic and fine-grained logic units, along with flexible IO units and distributed interconnects. As a software/hardware co-design framework, <i>COFFA</i> has a powerful compiler that extracts and optimizes fine-grained logic operations from irregular loops, performs coarse-grained arithmetic and memory optimizations, and offloads the loops to the accelerator. Across various challenging benchmarks with irregular loops, <i>COFFA</i> achieves significant performance and energy efficiency improvements over an in-order, an out-of-order RISC-V CPUs, and a recent FPGA, respectively. Moreover, compared with the state-of-the-art CGRA <i>UE-CGRA</i> and <i>Hycube</i>, <i>COFFA</i> can achieve 2.5<inline-formula><tex-math>$\\times$</tex-math></inline-formula> and 3.5<inline-formula><tex-math>$\\times$</tex-math></inline-formula> performance gains, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"3099-3113"},"PeriodicalIF":3.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"COFFA: A Co-Design Framework for Fused-Grained Reconfigurable Architecture Towards Efficient Irregular Loop Handling\",\"authors\":\"Yuan Dai;Xuchen Gao;Yunhui Qiu;Jingyuan Li;Yuhang Cao;Yiqing Mao;Sichao Chen;Wenbo Yin;Wai-Shing Luk;Lingli Wang\",\"doi\":\"10.1109/TC.2025.3585345\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Coarse-Grained Reconfigurable Architecture (CGRA) emerges as a competitive accelerator due to its high flexibility and energy efficiency. However, most CGRAs are effective for computation-intensive applications with regular loops but struggle with irregular loops containing control flows. These loops introduce fine-grained logic operations and are costly to execute by coarse-grained arithmetic units in CGRA. Efficiently handling such logic operations necessitates incorporating Boolean algebra optimization, which can improve logic density and reduce logic depth. Unfortunately, no previous research has incorporated it into the compilation flow to support irregular loops efficiently. We propose <i>COFFA</i>, an open-source framework for heterogeneous architecture with a RISC-V CPU and a fused-grained reconfigurable accelerator, which integrates coarse-grained arithmetic and fine-grained logic units, along with flexible IO units and distributed interconnects. As a software/hardware co-design framework, <i>COFFA</i> has a powerful compiler that extracts and optimizes fine-grained logic operations from irregular loops, performs coarse-grained arithmetic and memory optimizations, and offloads the loops to the accelerator. Across various challenging benchmarks with irregular loops, <i>COFFA</i> achieves significant performance and energy efficiency improvements over an in-order, an out-of-order RISC-V CPUs, and a recent FPGA, respectively. Moreover, compared with the state-of-the-art CGRA <i>UE-CGRA</i> and <i>Hycube</i>, <i>COFFA</i> can achieve 2.5<inline-formula><tex-math>$\\\\times$</tex-math></inline-formula> and 3.5<inline-formula><tex-math>$\\\\times$</tex-math></inline-formula> performance gains, respectively.\",\"PeriodicalId\":13087,\"journal\":{\"name\":\"IEEE Transactions on Computers\",\"volume\":\"74 9\",\"pages\":\"3099-3113\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computers\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11062918/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11062918/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
COFFA: A Co-Design Framework for Fused-Grained Reconfigurable Architecture Towards Efficient Irregular Loop Handling
Coarse-Grained Reconfigurable Architecture (CGRA) emerges as a competitive accelerator due to its high flexibility and energy efficiency. However, most CGRAs are effective for computation-intensive applications with regular loops but struggle with irregular loops containing control flows. These loops introduce fine-grained logic operations and are costly to execute by coarse-grained arithmetic units in CGRA. Efficiently handling such logic operations necessitates incorporating Boolean algebra optimization, which can improve logic density and reduce logic depth. Unfortunately, no previous research has incorporated it into the compilation flow to support irregular loops efficiently. We propose COFFA, an open-source framework for heterogeneous architecture with a RISC-V CPU and a fused-grained reconfigurable accelerator, which integrates coarse-grained arithmetic and fine-grained logic units, along with flexible IO units and distributed interconnects. As a software/hardware co-design framework, COFFA has a powerful compiler that extracts and optimizes fine-grained logic operations from irregular loops, performs coarse-grained arithmetic and memory optimizations, and offloads the loops to the accelerator. Across various challenging benchmarks with irregular loops, COFFA achieves significant performance and energy efficiency improvements over an in-order, an out-of-order RISC-V CPUs, and a recent FPGA, respectively. Moreover, compared with the state-of-the-art CGRA UE-CGRA and Hycube, COFFA can achieve 2.5$\times$ and 3.5$\times$ performance gains, respectively.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.