SaaP：重新架构soc即处理器以协调硬件异构性

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-03-20 DOI:10.1109/TCAD.2025.3553074

Pengwei Jin;Zhe Fan;Yongwei Zhao;Zidong Du;Hongrui Guo;Ziyuan Nan;Yifan Hao;Chongxiao Li;Tianyun Ma;Zhenxing Zhang;Xiaqing Li;Wei Li;Xing Hu;Qi Guo;Zhiwei Xu;Tianshi Chen

{"title":"SaaP：重新架构soc即处理器以协调硬件异构性","authors":"Pengwei Jin;Zhe Fan;Yongwei Zhao;Zidong Du;Hongrui Guo;Ziyuan Nan;Yifan Hao;Chongxiao Li;Tianyun Ma;Zhenxing Zhang;Xiaqing Li;Wei Li;Xing Hu;Qi Guo;Zhiwei Xu;Tianshi Chen","doi":"10.1109/TCAD.2025.3553074","DOIUrl":null,"url":null,"abstract":"Due to the end of Moore’s Law and Dennard Scaling, Domain-Specific Accelerators (DSAs) have come to a Cambrian explosion. Especially when advancing into the intelligent era, more and more DSAs are integrated into System-on-Chips (SoCs) as intellectual property (IP) blocks to provide high performance and efficiency. Currently, IPs usually expose IP-dependent hardware interfaces, requiring SoCs to manage them as isolated devices with software running on the host CPU. However, such software-managed heterogeneity in CPU-centric SoCs leads to low IP utilization. This inefficiency arises from the dependence on software optimization, coupled with the control and data exchange overheads. To improve IP utilization of heterogeneous SoCs, in this article, we rearchitect the SoC as a processor (i.e., SaaP) to orchestrate hardware heterogeneity. SaaP features an orchestration pipeline where DSAs are integrated as execution units and managed directly by the hardware pipeline to conceal the hardware heterogeneity from software. Moreover, SaaP redesigns the register file and data paths to implement an IP-level data-forwarding mechanism, avoiding the costly control and data exchange in the CPU-centric execution model. Block data dependence among different DSAs is carefully resolved to exploit mixed-level parallelism and inter-IP data exchange. SaaP abstracts tasks as mixed-scale instructions, where each instruction can be mapped to different IPs. Experimental results show that compared against Xavier on six fully software-optimized benchmarks from different domains, SaaP-rearchitected Xavier achieves a <inline-formula> <tex-math>$2.08{\\times }$ </tex-math></inline-formula> speedup, with an 8.21% area reduction and only 2.98% increase in power consumption.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 10","pages":"3962-3975"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SaaP: Rearchitect SoC-as-a-Processor to Orchestrate Hardware Heterogeneity\",\"authors\":\"Pengwei Jin;Zhe Fan;Yongwei Zhao;Zidong Du;Hongrui Guo;Ziyuan Nan;Yifan Hao;Chongxiao Li;Tianyun Ma;Zhenxing Zhang;Xiaqing Li;Wei Li;Xing Hu;Qi Guo;Zhiwei Xu;Tianshi Chen\",\"doi\":\"10.1109/TCAD.2025.3553074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the end of Moore’s Law and Dennard Scaling, Domain-Specific Accelerators (DSAs) have come to a Cambrian explosion. Especially when advancing into the intelligent era, more and more DSAs are integrated into System-on-Chips (SoCs) as intellectual property (IP) blocks to provide high performance and efficiency. Currently, IPs usually expose IP-dependent hardware interfaces, requiring SoCs to manage them as isolated devices with software running on the host CPU. However, such software-managed heterogeneity in CPU-centric SoCs leads to low IP utilization. This inefficiency arises from the dependence on software optimization, coupled with the control and data exchange overheads. To improve IP utilization of heterogeneous SoCs, in this article, we rearchitect the SoC as a processor (i.e., SaaP) to orchestrate hardware heterogeneity. SaaP features an orchestration pipeline where DSAs are integrated as execution units and managed directly by the hardware pipeline to conceal the hardware heterogeneity from software. Moreover, SaaP redesigns the register file and data paths to implement an IP-level data-forwarding mechanism, avoiding the costly control and data exchange in the CPU-centric execution model. Block data dependence among different DSAs is carefully resolved to exploit mixed-level parallelism and inter-IP data exchange. SaaP abstracts tasks as mixed-scale instructions, where each instruction can be mapped to different IPs. Experimental results show that compared against Xavier on six fully software-optimized benchmarks from different domains, SaaP-rearchitected Xavier achieves a <inline-formula> <tex-math>$2.08{\\\\times }$ </tex-math></inline-formula> speedup, with an 8.21% area reduction and only 2.98% increase in power consumption.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"44 10\",\"pages\":\"3962-3975\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10935670/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10935670/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

由于摩尔定律和登纳德缩放法的终结，特定领域加速器（dsa）进入了寒武纪大爆发时期。特别是随着智能时代的发展，越来越多的dsa作为知识产权（IP）模块集成到系统级芯片（soc）中，以提供高性能和高效率。目前，ip通常暴露与ip相关的硬件接口，要求soc将它们作为独立的设备进行管理，并在主机CPU上运行软件。然而，在以cpu为中心的soc中，这种软件管理的异构性导致IP利用率较低。这种低效率源于对软件优化的依赖，以及控制和数据交换的开销。为了提高异构SoC的IP利用率，在本文中，我们将SoC重新架构为处理器（即SaaP），以编排硬件异构。SaaP的特点是一个编排管道，其中dsa被集成为执行单元，并由硬件管道直接管理，以隐藏硬件对软件的异构性。此外，SaaP重新设计了注册文件和数据路径，以实现ip级数据转发机制，避免了以cpu为中心的执行模型中代价高昂的控制和数据交换。仔细解决了不同dsa之间的块数据依赖性，以利用混合级并行性和ip间数据交换。SaaP将任务抽象为混合规模指令，其中每个指令可以映射到不同的ip。实验结果表明，在来自不同领域的六个完全软件优化的基准测试中，与Xavier相比，saap重新架构的Xavier实现了2.08{\times}$的加速，面积减少了8.21%，功耗仅增加了2.98%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SaaP: Rearchitect SoC-as-a-Processor to Orchestrate Hardware Heterogeneity

Due to the end of Moore’s Law and Dennard Scaling, Domain-Specific Accelerators (DSAs) have come to a Cambrian explosion. Especially when advancing into the intelligent era, more and more DSAs are integrated into System-on-Chips (SoCs) as intellectual property (IP) blocks to provide high performance and efficiency. Currently, IPs usually expose IP-dependent hardware interfaces, requiring SoCs to manage them as isolated devices with software running on the host CPU. However, such software-managed heterogeneity in CPU-centric SoCs leads to low IP utilization. This inefficiency arises from the dependence on software optimization, coupled with the control and data exchange overheads. To improve IP utilization of heterogeneous SoCs, in this article, we rearchitect the SoC as a processor (i.e., SaaP) to orchestrate hardware heterogeneity. SaaP features an orchestration pipeline where DSAs are integrated as execution units and managed directly by the hardware pipeline to conceal the hardware heterogeneity from software. Moreover, SaaP redesigns the register file and data paths to implement an IP-level data-forwarding mechanism, avoiding the costly control and data exchange in the CPU-centric execution model. Block data dependence among different DSAs is carefully resolved to exploit mixed-level parallelism and inter-IP data exchange. SaaP abstracts tasks as mixed-scale instructions, where each instruction can be mapped to different IPs. Experimental results show that compared against Xavier on six fully software-optimized benchmarks from different domains, SaaP-rearchitected Xavier achieves a

$2.08{\times }$

speedup, with an 8.21% area reduction and only 2.98% increase in power consumption.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 工程技术-工程：电子与电气

CiteScore

5.60

自引率

13.80%

发文量

500

审稿时长

7 months

期刊介绍： The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.