SaaP:重新架构soc即处理器以协调硬件异构性

IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Pengwei Jin;Zhe Fan;Yongwei Zhao;Zidong Du;Hongrui Guo;Ziyuan Nan;Yifan Hao;Chongxiao Li;Tianyun Ma;Zhenxing Zhang;Xiaqing Li;Wei Li;Xing Hu;Qi Guo;Zhiwei Xu;Tianshi Chen
{"title":"SaaP:重新架构soc即处理器以协调硬件异构性","authors":"Pengwei Jin;Zhe Fan;Yongwei Zhao;Zidong Du;Hongrui Guo;Ziyuan Nan;Yifan Hao;Chongxiao Li;Tianyun Ma;Zhenxing Zhang;Xiaqing Li;Wei Li;Xing Hu;Qi Guo;Zhiwei Xu;Tianshi Chen","doi":"10.1109/TCAD.2025.3553074","DOIUrl":null,"url":null,"abstract":"Due to the end of Moore’s Law and Dennard Scaling, Domain-Specific Accelerators (DSAs) have come to a Cambrian explosion. Especially when advancing into the intelligent era, more and more DSAs are integrated into System-on-Chips (SoCs) as intellectual property (IP) blocks to provide high performance and efficiency. Currently, IPs usually expose IP-dependent hardware interfaces, requiring SoCs to manage them as isolated devices with software running on the host CPU. However, such software-managed heterogeneity in CPU-centric SoCs leads to low IP utilization. This inefficiency arises from the dependence on software optimization, coupled with the control and data exchange overheads. To improve IP utilization of heterogeneous SoCs, in this article, we rearchitect the SoC as a processor (i.e., SaaP) to orchestrate hardware heterogeneity. SaaP features an orchestration pipeline where DSAs are integrated as execution units and managed directly by the hardware pipeline to conceal the hardware heterogeneity from software. Moreover, SaaP redesigns the register file and data paths to implement an IP-level data-forwarding mechanism, avoiding the costly control and data exchange in the CPU-centric execution model. Block data dependence among different DSAs is carefully resolved to exploit mixed-level parallelism and inter-IP data exchange. SaaP abstracts tasks as mixed-scale instructions, where each instruction can be mapped to different IPs. Experimental results show that compared against Xavier on six fully software-optimized benchmarks from different domains, SaaP-rearchitected Xavier achieves a <inline-formula> <tex-math>$2.08{\\times }$ </tex-math></inline-formula> speedup, with an 8.21% area reduction and only 2.98% increase in power consumption.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 10","pages":"3962-3975"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SaaP: Rearchitect SoC-as-a-Processor to Orchestrate Hardware Heterogeneity\",\"authors\":\"Pengwei Jin;Zhe Fan;Yongwei Zhao;Zidong Du;Hongrui Guo;Ziyuan Nan;Yifan Hao;Chongxiao Li;Tianyun Ma;Zhenxing Zhang;Xiaqing Li;Wei Li;Xing Hu;Qi Guo;Zhiwei Xu;Tianshi Chen\",\"doi\":\"10.1109/TCAD.2025.3553074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the end of Moore’s Law and Dennard Scaling, Domain-Specific Accelerators (DSAs) have come to a Cambrian explosion. Especially when advancing into the intelligent era, more and more DSAs are integrated into System-on-Chips (SoCs) as intellectual property (IP) blocks to provide high performance and efficiency. Currently, IPs usually expose IP-dependent hardware interfaces, requiring SoCs to manage them as isolated devices with software running on the host CPU. However, such software-managed heterogeneity in CPU-centric SoCs leads to low IP utilization. This inefficiency arises from the dependence on software optimization, coupled with the control and data exchange overheads. To improve IP utilization of heterogeneous SoCs, in this article, we rearchitect the SoC as a processor (i.e., SaaP) to orchestrate hardware heterogeneity. SaaP features an orchestration pipeline where DSAs are integrated as execution units and managed directly by the hardware pipeline to conceal the hardware heterogeneity from software. Moreover, SaaP redesigns the register file and data paths to implement an IP-level data-forwarding mechanism, avoiding the costly control and data exchange in the CPU-centric execution model. Block data dependence among different DSAs is carefully resolved to exploit mixed-level parallelism and inter-IP data exchange. SaaP abstracts tasks as mixed-scale instructions, where each instruction can be mapped to different IPs. Experimental results show that compared against Xavier on six fully software-optimized benchmarks from different domains, SaaP-rearchitected Xavier achieves a <inline-formula> <tex-math>$2.08{\\\\times }$ </tex-math></inline-formula> speedup, with an 8.21% area reduction and only 2.98% increase in power consumption.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"44 10\",\"pages\":\"3962-3975\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10935670/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10935670/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

由于摩尔定律和登纳德缩放法的终结,特定领域加速器(dsa)进入了寒武纪大爆发时期。特别是随着智能时代的发展,越来越多的dsa作为知识产权(IP)模块集成到系统级芯片(soc)中,以提供高性能和高效率。目前,ip通常暴露与ip相关的硬件接口,要求soc将它们作为独立的设备进行管理,并在主机CPU上运行软件。然而,在以cpu为中心的soc中,这种软件管理的异构性导致IP利用率较低。这种低效率源于对软件优化的依赖,以及控制和数据交换的开销。为了提高异构SoC的IP利用率,在本文中,我们将SoC重新架构为处理器(即SaaP),以编排硬件异构。SaaP的特点是一个编排管道,其中dsa被集成为执行单元,并由硬件管道直接管理,以隐藏硬件对软件的异构性。此外,SaaP重新设计了注册文件和数据路径,以实现ip级数据转发机制,避免了以cpu为中心的执行模型中代价高昂的控制和数据交换。仔细解决了不同dsa之间的块数据依赖性,以利用混合级并行性和ip间数据交换。SaaP将任务抽象为混合规模指令,其中每个指令可以映射到不同的ip。实验结果表明,在来自不同领域的六个完全软件优化的基准测试中,与Xavier相比,saap重新架构的Xavier实现了2.08{\times}$的加速,面积减少了8.21%,功耗仅增加了2.98%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SaaP: Rearchitect SoC-as-a-Processor to Orchestrate Hardware Heterogeneity
Due to the end of Moore’s Law and Dennard Scaling, Domain-Specific Accelerators (DSAs) have come to a Cambrian explosion. Especially when advancing into the intelligent era, more and more DSAs are integrated into System-on-Chips (SoCs) as intellectual property (IP) blocks to provide high performance and efficiency. Currently, IPs usually expose IP-dependent hardware interfaces, requiring SoCs to manage them as isolated devices with software running on the host CPU. However, such software-managed heterogeneity in CPU-centric SoCs leads to low IP utilization. This inefficiency arises from the dependence on software optimization, coupled with the control and data exchange overheads. To improve IP utilization of heterogeneous SoCs, in this article, we rearchitect the SoC as a processor (i.e., SaaP) to orchestrate hardware heterogeneity. SaaP features an orchestration pipeline where DSAs are integrated as execution units and managed directly by the hardware pipeline to conceal the hardware heterogeneity from software. Moreover, SaaP redesigns the register file and data paths to implement an IP-level data-forwarding mechanism, avoiding the costly control and data exchange in the CPU-centric execution model. Block data dependence among different DSAs is carefully resolved to exploit mixed-level parallelism and inter-IP data exchange. SaaP abstracts tasks as mixed-scale instructions, where each instruction can be mapped to different IPs. Experimental results show that compared against Xavier on six fully software-optimized benchmarks from different domains, SaaP-rearchitected Xavier achieves a $2.08{\times }$ speedup, with an 8.21% area reduction and only 2.98% increase in power consumption.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.60
自引率
13.80%
发文量
500
审稿时长
7 months
期刊介绍: The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信