Hoda Sedighi;Daniel Gehberger;Amin Ebrahimzadeh;Fetahi Wuhib;Roch H. Glitho
{"title":"Efficient Dynamic Resource Management for Spatial Multitasking GPUs","authors":"Hoda Sedighi;Daniel Gehberger;Amin Ebrahimzadeh;Fetahi Wuhib;Roch H. Glitho","doi":"10.1109/TCC.2024.3511548","DOIUrl":null,"url":null,"abstract":"The advent of microservice architecture enables complex cloud applications to be realized via a set of individually isolated components, increasing their flexibility and performance. As these applications require massive computing resources, graphics processing units (GPUs) are being widely used as high-speed parallel computing devices to meet the stringent demands. Although current GPUs allow application components to be executed concurrently via spatial multitasking, they face several challenges. The first challenge is allocating the computing resources to components dynamically to maximize efficiency. The second challenge is avoiding performance degradation caused by the data transfer overhead between the components. To address these challenges, we propose an efficient GPU resource management technique that dynamically allocates GPU resources to application components. The proposed method allocates resources based on component workloads and uses online performance monitoring to guarantee the application's performance. We also propose a GPU memory manager to reduce the data transfer overhead between components via shared memory. Our evaluation results indicate that the proposed dynamic resource allocation method improves application throughput by up to 134.12% compared to the state-of-the-art spatial multitasking techniques. We also show that using a shared memory results in 6x throughput improvement compared to the baseline User Datagram Protocol (UDP)-based technique.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 1","pages":"99-117"},"PeriodicalIF":5.3000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10778657/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The advent of microservice architecture enables complex cloud applications to be realized via a set of individually isolated components, increasing their flexibility and performance. As these applications require massive computing resources, graphics processing units (GPUs) are being widely used as high-speed parallel computing devices to meet the stringent demands. Although current GPUs allow application components to be executed concurrently via spatial multitasking, they face several challenges. The first challenge is allocating the computing resources to components dynamically to maximize efficiency. The second challenge is avoiding performance degradation caused by the data transfer overhead between the components. To address these challenges, we propose an efficient GPU resource management technique that dynamically allocates GPU resources to application components. The proposed method allocates resources based on component workloads and uses online performance monitoring to guarantee the application's performance. We also propose a GPU memory manager to reduce the data transfer overhead between components via shared memory. Our evaluation results indicate that the proposed dynamic resource allocation method improves application throughput by up to 134.12% compared to the state-of-the-art spatial multitasking techniques. We also show that using a shared memory results in 6x throughput improvement compared to the baseline User Datagram Protocol (UDP)-based technique.
期刊介绍:
The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.