多核上的缓存感知DAG调度方法：利用节点亲和性和延迟执行

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture Pub Date : 2025-02-25 DOI:10.1016/j.sysarc.2025.103372

Huixuan Yi , Yuanhai Zhang , Zhiyang Lin , Haoran Chen , Yiyang Gao , Xiaotian Dai , Shuai Zhao

{"title":"多核上的缓存感知DAG调度方法：利用节点亲和性和延迟执行","authors":"Huixuan Yi , Yuanhai Zhang , Zhiyang Lin , Haoran Chen , Yiyang Gao , Xiaotian Dai , Shuai Zhao","doi":"10.1016/j.sysarc.2025.103372","DOIUrl":null,"url":null,"abstract":"<div><div>With increasingly complex functionalities being implemented in emerging applications, multicores are widely adopted with a layered cache hierarchy, and Directed Acyclic Graphs (DAGs) are commonly employed to model the execution dependencies between tasks. For such systems, scheduling methods can be designed to effectively leverage the cache to accelerate the system execution. However, the traditional methods either do not consider DAGs, or rely on sophisticated static analysis to produce fixed scheduling solutions that require additional hardware support (<strong>e.g.</strong>, cache partitioning and colouring), which undermines both the applicability and flexibility of these methods. Recently, an online cache-aware DAG scheduling method has been presented that schedules DAGs using an execution time model with caching effects considered, eliminating the need for static analysis and additional hardware support. However, this method relies on simple heuristics with limited considerations on both the allocatable cores and the competition between nodes, resulting in intensive inter-node contention that undermines cache performance. This paper proposes CADE, a cache-aware scheduling method for DAG tasks that leverages the cache to reduce DAG makespan. To achieve this, an affinity-aware priority assignment is first constructed that mitigates the competition among nodes for their preferred cores to hit the cache. Then, a contention-aware allocation mechanism is constructed, which (i) accounts for the impact of an allocation decision on the speed-up of other nodes; and (ii) includes the busy cores for allocation by enabling the deferred execution, effectively enhancing the cache performance to accelerate the DAG execution. Experiments show that compared to the state-of-the-art, the CADE significantly reduces the DAG makespan by 24.02% on average (up to 33%) with the cache miss rate reduced by 22.06% on average.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"161 ","pages":"Article 103372"},"PeriodicalIF":3.7000,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A cache-aware DAG scheduling method on multicores: Exploiting node affinity and deferred executions\",\"authors\":\"Huixuan Yi , Yuanhai Zhang , Zhiyang Lin , Haoran Chen , Yiyang Gao , Xiaotian Dai , Shuai Zhao\",\"doi\":\"10.1016/j.sysarc.2025.103372\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With increasingly complex functionalities being implemented in emerging applications, multicores are widely adopted with a layered cache hierarchy, and Directed Acyclic Graphs (DAGs) are commonly employed to model the execution dependencies between tasks. For such systems, scheduling methods can be designed to effectively leverage the cache to accelerate the system execution. However, the traditional methods either do not consider DAGs, or rely on sophisticated static analysis to produce fixed scheduling solutions that require additional hardware support (<strong>e.g.</strong>, cache partitioning and colouring), which undermines both the applicability and flexibility of these methods. Recently, an online cache-aware DAG scheduling method has been presented that schedules DAGs using an execution time model with caching effects considered, eliminating the need for static analysis and additional hardware support. However, this method relies on simple heuristics with limited considerations on both the allocatable cores and the competition between nodes, resulting in intensive inter-node contention that undermines cache performance. This paper proposes CADE, a cache-aware scheduling method for DAG tasks that leverages the cache to reduce DAG makespan. To achieve this, an affinity-aware priority assignment is first constructed that mitigates the competition among nodes for their preferred cores to hit the cache. Then, a contention-aware allocation mechanism is constructed, which (i) accounts for the impact of an allocation decision on the speed-up of other nodes; and (ii) includes the busy cores for allocation by enabling the deferred execution, effectively enhancing the cache performance to accelerate the DAG execution. Experiments show that compared to the state-of-the-art, the CADE significantly reduces the DAG makespan by 24.02% on average (up to 33%) with the cache miss rate reduced by 22.06% on average.</div></div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"161 \",\"pages\":\"Article 103372\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S138376212500044X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S138376212500044X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

随着新兴应用程序实现越来越复杂的功能，多核被广泛采用分层缓存层次结构，并且通常使用有向无环图（dag）来建模任务之间的执行依赖关系。对于这样的系统，可以设计调度方法来有效地利用缓存来加速系统执行。然而，传统方法要么不考虑dag，要么依赖复杂的静态分析来生成固定的调度解决方案，这些解决方案需要额外的硬件支持（例如，缓存分区和着色），这破坏了这些方法的适用性和灵活性。最近，提出了一种在线缓存感知的DAG调度方法，该方法使用考虑缓存影响的执行时间模型来调度DAG，从而消除了对静态分析和额外硬件支持的需要。然而，这种方法依赖于简单的启发式算法，对可分配内核和节点之间的竞争考虑有限，导致节点间的激烈争用，从而降低了缓存性能。本文提出了一种缓存感知的DAG任务调度方法CADE，该方法利用缓存来减少DAG最大跨度。为了实现这一点，首先构造一个亲缘感知的优先级分配，以减轻节点之间对其首选核心撞击缓存的竞争。然后，构建了一个竞争感知的分配机制，该机制(i)考虑了分配决策对其他节点加速的影响；（ii）通过启用延迟执行，包括用于分配的繁忙核，有效提高缓存性能，加速DAG执行。实验表明，与目前最先进的方法相比，CADE可显著降低DAG完工时间平均24.02%（最高达33%），缓存丢失率平均降低22.06%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A cache-aware DAG scheduling method on multicores: Exploiting node affinity and deferred executions

With increasingly complex functionalities being implemented in emerging applications, multicores are widely adopted with a layered cache hierarchy, and Directed Acyclic Graphs (DAGs) are commonly employed to model the execution dependencies between tasks. For such systems, scheduling methods can be designed to effectively leverage the cache to accelerate the system execution. However, the traditional methods either do not consider DAGs, or rely on sophisticated static analysis to produce fixed scheduling solutions that require additional hardware support (e.g., cache partitioning and colouring), which undermines both the applicability and flexibility of these methods. Recently, an online cache-aware DAG scheduling method has been presented that schedules DAGs using an execution time model with caching effects considered, eliminating the need for static analysis and additional hardware support. However, this method relies on simple heuristics with limited considerations on both the allocatable cores and the competition between nodes, resulting in intensive inter-node contention that undermines cache performance. This paper proposes CADE, a cache-aware scheduling method for DAG tasks that leverages the cache to reduce DAG makespan. To achieve this, an affinity-aware priority assignment is first constructed that mitigates the competition among nodes for their preferred cores to hit the cache. Then, a contention-aware allocation mechanism is constructed, which (i) accounts for the impact of an allocation decision on the speed-up of other nodes; and (ii) includes the busy cores for allocation by enabling the deferred execution, effectively enhancing the cache performance to accelerate the DAG execution. Experiments show that compared to the state-of-the-art, the CADE significantly reduces the DAG makespan by 24.02% on average (up to 33%) with the cache miss rate reduced by 22.06% on average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.