{"title":"SaaP: Rearchitect SoC-as-a-Processor to Orchestrate Hardware Heterogeneity","authors":"Pengwei Jin;Zhe Fan;Yongwei Zhao;Zidong Du;Hongrui Guo;Ziyuan Nan;Yifan Hao;Chongxiao Li;Tianyun Ma;Zhenxing Zhang;Xiaqing Li;Wei Li;Xing Hu;Qi Guo;Zhiwei Xu;Tianshi Chen","doi":"10.1109/TCAD.2025.3553074","DOIUrl":null,"url":null,"abstract":"Due to the end of Moore’s Law and Dennard Scaling, Domain-Specific Accelerators (DSAs) have come to a Cambrian explosion. Especially when advancing into the intelligent era, more and more DSAs are integrated into System-on-Chips (SoCs) as intellectual property (IP) blocks to provide high performance and efficiency. Currently, IPs usually expose IP-dependent hardware interfaces, requiring SoCs to manage them as isolated devices with software running on the host CPU. However, such software-managed heterogeneity in CPU-centric SoCs leads to low IP utilization. This inefficiency arises from the dependence on software optimization, coupled with the control and data exchange overheads. To improve IP utilization of heterogeneous SoCs, in this article, we rearchitect the SoC as a processor (i.e., SaaP) to orchestrate hardware heterogeneity. SaaP features an orchestration pipeline where DSAs are integrated as execution units and managed directly by the hardware pipeline to conceal the hardware heterogeneity from software. Moreover, SaaP redesigns the register file and data paths to implement an IP-level data-forwarding mechanism, avoiding the costly control and data exchange in the CPU-centric execution model. Block data dependence among different DSAs is carefully resolved to exploit mixed-level parallelism and inter-IP data exchange. SaaP abstracts tasks as mixed-scale instructions, where each instruction can be mapped to different IPs. Experimental results show that compared against Xavier on six fully software-optimized benchmarks from different domains, SaaP-rearchitected Xavier achieves a <inline-formula> <tex-math>$2.08{\\times }$ </tex-math></inline-formula> speedup, with an 8.21% area reduction and only 2.98% increase in power consumption.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 10","pages":"3962-3975"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10935670/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Due to the end of Moore’s Law and Dennard Scaling, Domain-Specific Accelerators (DSAs) have come to a Cambrian explosion. Especially when advancing into the intelligent era, more and more DSAs are integrated into System-on-Chips (SoCs) as intellectual property (IP) blocks to provide high performance and efficiency. Currently, IPs usually expose IP-dependent hardware interfaces, requiring SoCs to manage them as isolated devices with software running on the host CPU. However, such software-managed heterogeneity in CPU-centric SoCs leads to low IP utilization. This inefficiency arises from the dependence on software optimization, coupled with the control and data exchange overheads. To improve IP utilization of heterogeneous SoCs, in this article, we rearchitect the SoC as a processor (i.e., SaaP) to orchestrate hardware heterogeneity. SaaP features an orchestration pipeline where DSAs are integrated as execution units and managed directly by the hardware pipeline to conceal the hardware heterogeneity from software. Moreover, SaaP redesigns the register file and data paths to implement an IP-level data-forwarding mechanism, avoiding the costly control and data exchange in the CPU-centric execution model. Block data dependence among different DSAs is carefully resolved to exploit mixed-level parallelism and inter-IP data exchange. SaaP abstracts tasks as mixed-scale instructions, where each instruction can be mapped to different IPs. Experimental results show that compared against Xavier on six fully software-optimized benchmarks from different domains, SaaP-rearchitected Xavier achieves a $2.08{\times }$ speedup, with an 8.21% area reduction and only 2.98% increase in power consumption.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.