Efficient Dynamic Resource Management for Spatial Multitasking GPUs

IF 5 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Cloud Computing Pub Date : 2024-12-05 DOI:10.1109/TCC.2024.3511548

Hoda Sedighi;Daniel Gehberger;Amin Ebrahimzadeh;Fetahi Wuhib;Roch H. Glitho

{"title":"Efficient Dynamic Resource Management for Spatial Multitasking GPUs","authors":"Hoda Sedighi;Daniel Gehberger;Amin Ebrahimzadeh;Fetahi Wuhib;Roch H. Glitho","doi":"10.1109/TCC.2024.3511548","DOIUrl":null,"url":null,"abstract":"The advent of microservice architecture enables complex cloud applications to be realized via a set of individually isolated components, increasing their flexibility and performance. As these applications require massive computing resources, graphics processing units (GPUs) are being widely used as high-speed parallel computing devices to meet the stringent demands. Although current GPUs allow application components to be executed concurrently via spatial multitasking, they face several challenges. The first challenge is allocating the computing resources to components dynamically to maximize efficiency. The second challenge is avoiding performance degradation caused by the data transfer overhead between the components. To address these challenges, we propose an efficient GPU resource management technique that dynamically allocates GPU resources to application components. The proposed method allocates resources based on component workloads and uses online performance monitoring to guarantee the application's performance. We also propose a GPU memory manager to reduce the data transfer overhead between components via shared memory. Our evaluation results indicate that the proposed dynamic resource allocation method improves application throughput by up to 134.12% compared to the state-of-the-art spatial multitasking techniques. We also show that using a shared memory results in 6x throughput improvement compared to the baseline User Datagram Protocol (UDP)-based technique.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"99-117"},"PeriodicalIF":5.0000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10778657/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The advent of microservice architecture enables complex cloud applications to be realized via a set of individually isolated components, increasing their flexibility and performance. As these applications require massive computing resources, graphics processing units (GPUs) are being widely used as high-speed parallel computing devices to meet the stringent demands. Although current GPUs allow application components to be executed concurrently via spatial multitasking, they face several challenges. The first challenge is allocating the computing resources to components dynamically to maximize efficiency. The second challenge is avoiding performance degradation caused by the data transfer overhead between the components. To address these challenges, we propose an efficient GPU resource management technique that dynamically allocates GPU resources to application components. The proposed method allocates resources based on component workloads and uses online performance monitoring to guarantee the application's performance. We also propose a GPU memory manager to reduce the data transfer overhead between components via shared memory. Our evaluation results indicate that the proposed dynamic resource allocation method improves application throughput by up to 134.12% compared to the state-of-the-art spatial multitasking techniques. We also show that using a shared memory results in 6x throughput improvement compared to the baseline User Datagram Protocol (UDP)-based technique.

查看原文本刊更多论文

空间多任务gpu的高效动态资源管理

微服务架构的出现使得复杂的云应用程序可以通过一组独立的组件来实现，从而提高了它们的灵活性和性能。由于这些应用需要大量的计算资源，图形处理单元（graphics processing unit, gpu）作为高速并行计算设备被广泛使用，以满足苛刻的要求。尽管当前的gpu允许应用程序组件通过空间多任务并发执行，但它们面临着一些挑战。第一个挑战是动态地将计算资源分配给组件以最大化效率。第二个挑战是避免由组件之间的数据传输开销引起的性能下降。为了解决这些挑战，我们提出了一种高效的GPU资源管理技术，该技术可以动态地将GPU资源分配给应用程序组件。该方法基于组件工作负载分配资源，并使用在线性能监控来保证应用程序的性能。我们还提出了一个GPU内存管理器，通过共享内存减少组件之间的数据传输开销。我们的评估结果表明，与最先进的空间多任务处理技术相比，所提出的动态资源分配方法可将应用程序吞吐量提高134.12%。我们还表明，与基于用户数据报协议（UDP）的基线技术相比，使用共享内存可使吞吐量提高6倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Cloud Computing Computer Science-Software

CiteScore

9.40

自引率

6.20%

发文量

167

期刊介绍： The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.