NURA: A Framework for Supporting Non-Uniform Resource Accesses in GPUs

Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems Pub Date : 2022-06-06 DOI:10.1145/3489048.3522656

Sina Darabi, Negin Mahani, Hazhir Bakhishi, Ehsan Yousefzadeh-Asl-Miandoab, Mohammad Sadrosadati, H. Sarbazi-Azad

{"title":"NURA: A Framework for Supporting Non-Uniform Resource Accesses in GPUs","authors":"Sina Darabi, Negin Mahani, Hazhir Bakhishi, Ehsan Yousefzadeh-Asl-Miandoab, Mohammad Sadrosadati, H. Sarbazi-Azad","doi":"10.1145/3489048.3522656","DOIUrl":null,"url":null,"abstract":"Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g. spatial multitasking) have limited opportunity to improve resource utilization, while others, e.g. simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensure fairness and Quality-of-Service(QoS). The key idea is that each streaming multiprocessor (SM) executes the Cooperative Thread Arrays (CTAs) that belong to only one application (similar to spatial multi-tasking) and shares its unused resources with the SMs running other applications demanding more resources. NURA handles resource sharing process mainly using a software approach to provide simplicity, low hardware overhead, and flexibility.We also perform some hardware modifications as an architectural support for our software-based proposal. Our conservative analysis reveals that the hardware area overhead of our proposal is less than 1.07% with respect to the whole GPU die. Our experimental results over various mixes of GPU workloads show that NURA improves throughput by 26% compared to the state-of-the-art spatial multi-tasking, on average, while meeting QoS targets. In terms of fairness, NURA has almost similar results to spatial multitasking, while it outperforms simultaneous multi-kernel by 76%, on average.","PeriodicalId":264598,"journal":{"name":"Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3489048.3522656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Multi-application execution in Graphics Processing Units (GPUs), a promising way to utilize GPU resources, is still challenging. Some pieces of prior work (e.g. spatial multitasking) have limited opportunity to improve resource utilization, while others, e.g. simultaneous multi-kernel, provide fine-grained resource sharing at the price of unfair execution. This paper proposes a new multi-application paradigm for GPUs, called NURA, that provides high potential to improve resource utilization and ensure fairness and Quality-of-Service(QoS). The key idea is that each streaming multiprocessor (SM) executes the Cooperative Thread Arrays (CTAs) that belong to only one application (similar to spatial multi-tasking) and shares its unused resources with the SMs running other applications demanding more resources. NURA handles resource sharing process mainly using a software approach to provide simplicity, low hardware overhead, and flexibility.We also perform some hardware modifications as an architectural support for our software-based proposal. Our conservative analysis reveals that the hardware area overhead of our proposal is less than 1.07% with respect to the whole GPU die. Our experimental results over various mixes of GPU workloads show that NURA improves throughput by 26% compared to the state-of-the-art spatial multi-tasking, on average, while meeting QoS targets. In terms of fairness, NURA has almost similar results to spatial multitasking, while it outperforms simultaneous multi-kernel by 76%, on average.

查看原文本刊更多论文

NURA:支持gpu中非统一资源访问的框架

在图形处理单元(GPU)中执行多应用程序是利用GPU资源的一种很有前途的方法，但仍然具有挑战性。一些先前的工作(例如空间多任务)在提高资源利用率方面的机会有限，而另一些工作(例如同步多内核)以不公平的执行为代价提供了细粒度的资源共享。本文提出了一种新的gpu多应用范例，称为NURA，它在提高资源利用率和确保公平性和服务质量(QoS)方面具有很高的潜力。关键思想是，每个流多处理器(SM)执行只属于一个应用程序的协作线程数组(cta)(类似于空间多任务)，并与运行其他需要更多资源的应用程序的SMs共享其未使用的资源。NURA主要使用软件方法处理资源共享过程，以提供简单性、低硬件开销和灵活性。我们还执行了一些硬件修改，作为基于软件的建议的架构支持。我们的保守分析表明，我们的建议的硬件面积开销相对于整个GPU芯片小于1.07%。我们对各种GPU工作负载混合的实验结果表明，与最先进的空间多任务相比，NURA平均提高了26%的吞吐量，同时满足QoS目标。就公平性而言，NURA的结果与空间多任务处理几乎相似，但它比同步多内核的平均性能高出76%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems

自引率

0.00%

发文量