GPES: a preemptive execution system for GPGPU computing

21st IEEE Real-Time and Embedded Technology and Applications Symposium Pub Date : 2015-04-13 DOI:10.1109/RTAS.2015.7108420

Husheng Zhou, G. Tong, Cong Liu

{"title":"GPES: a preemptive execution system for GPGPU computing","authors":"Husheng Zhou, G. Tong, Cong Liu","doi":"10.1109/RTAS.2015.7108420","DOIUrl":null,"url":null,"abstract":"Graphics processing units (GPUs) are being widely used as co-processors in many application domains to accelerate general-purpose workloads that are computationally intensive, known as GPGPU computing. Real-time multi-tasking support is a critical requirement for many emerging GPGPU computing domains. However, due to the asynchronous and non-preemptive nature of GPU processing, in multi-tasking environments, tasks with higher priority may be blocked by lower priority tasks for a lengthy duration. This severely harms the system's timing predictability and is a serious impediment limiting the applicability of GPGPU in many real-time and embedded systems. In this paper, we present an efficient GPGPU preemptive execution system (GPES), which combines user-level and driverlevel runtime engines to reduce the pending time of high-priority GPGPU tasks that may be blocked by long-freezing low-priority competing workloads. GPES automatically slices a long-running kernel execution into multiple subkernel launches and splits data transaction into multiple chunks at user-level, then inserts preemption points between subkernel launches and memorycopy operations at driver-level. We implement a prototype of GPES, and use real-world benchmarks and case studies for evaluation. Experimental results demonstrate that GPES is able to reduce the pending time of high-priority tasks in a multitasking environment by up to 90% over the existing GPU driver solutions, while introducing small overheads.","PeriodicalId":320300,"journal":{"name":"21st IEEE Real-Time and Embedded Technology and Applications Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"60","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st IEEE Real-Time and Embedded Technology and Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RTAS.2015.7108420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 60

Abstract

Graphics processing units (GPUs) are being widely used as co-processors in many application domains to accelerate general-purpose workloads that are computationally intensive, known as GPGPU computing. Real-time multi-tasking support is a critical requirement for many emerging GPGPU computing domains. However, due to the asynchronous and non-preemptive nature of GPU processing, in multi-tasking environments, tasks with higher priority may be blocked by lower priority tasks for a lengthy duration. This severely harms the system's timing predictability and is a serious impediment limiting the applicability of GPGPU in many real-time and embedded systems. In this paper, we present an efficient GPGPU preemptive execution system (GPES), which combines user-level and driverlevel runtime engines to reduce the pending time of high-priority GPGPU tasks that may be blocked by long-freezing low-priority competing workloads. GPES automatically slices a long-running kernel execution into multiple subkernel launches and splits data transaction into multiple chunks at user-level, then inserts preemption points between subkernel launches and memorycopy operations at driver-level. We implement a prototype of GPES, and use real-world benchmarks and case studies for evaluation. Experimental results demonstrate that GPES is able to reduce the pending time of high-priority tasks in a multitasking environment by up to 90% over the existing GPU driver solutions, while introducing small overheads.

查看原文本刊更多论文

GPES:用于GPGPU计算的抢占式执行系统

图形处理单元(gpu)在许多应用程序领域被广泛用作协处理器，以加速计算密集型的通用工作负载，即GPGPU计算。实时多任务支持是许多新兴GPGPU计算领域的关键要求。然而，由于GPU处理的异步性和非抢占性，在多任务环境中，高优先级的任务可能会被低优先级的任务阻塞很长时间。这严重损害了系统的时间可预测性，严重限制了GPGPU在许多实时和嵌入式系统中的适用性。本文提出了一种高效的GPGPU抢先执行系统(GPES)，该系统结合了用户级和驱动级运行时引擎，以减少高优先级GPGPU任务因低优先级竞争工作负载长时间冻结而阻塞的等待时间。GPES自动将长时间运行的内核执行分割为多个子内核启动，并在用户级将数据事务分割为多个块，然后在驱动级的子内核启动和内存复制操作之间插入抢占点。我们实现了GPES的原型，并使用真实世界的基准测试和案例研究进行评估。实验结果表明，与现有的GPU驱动方案相比，GPES能够将多任务环境中高优先级任务的等待时间减少高达90%，同时引入较小的开销。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

21st IEEE Real-Time and Embedded Technology and Applications Symposium

自引率

0.00%

发文量