Merge or Separate?: Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms

Proceedings of the General Purpose GPUs Pub Date : 2017-02-04 DOI:10.1145/3038228.3038235

Y. Wen, M. O’Boyle

引用次数: 35

Abstract

Computer systems are increasingly heterogeneous with nodes consisting of CPUs and GPU accelerators. As such systems become mainstream, they move away from specialized high-performance single application platforms to a more general setting with multiple, concurrent, application jobs. Determining how jobs should be dynamically best scheduled to heterogeneous devices is non-trivial. In certain cases, performance is maximized if jobs are allocated to a single device, in others, sharing is preferable. In this paper, we present a runtime framework which schedules multi-user OpenCL tasks to their most suitable device in a CPU/GPU system. We use a machine learning-based predictive model at runtime to detect whether to merge OpenCL kernels or schedule them separately to the most appropriate devices without the need for ahead-of-time profiling. We evaluate out approach over a wide range of workloads, on two separate platforms. We consistently show significant performance and turn-around time improvement over the state-of-the-art across programs, workload, and platforms.

查看原文本刊更多论文

合并还是分离?: OpenCL内核在CPU/GPU平台上的多任务调度

计算机系统越来越异构，节点由cpu和GPU加速器组成。随着此类系统成为主流，它们从专门的高性能单个应用程序平台转向具有多个并发应用程序作业的更通用设置。确定如何动态地将作业最佳地调度到异构设备是非常重要的。在某些情况下，如果将作业分配给单个设备，性能将得到最大化，而在其他情况下，共享是更可取的。在本文中，我们提出了一个运行时框架，该框架将多用户OpenCL任务调度到CPU/GPU系统中最合适的设备上。我们在运行时使用基于机器学习的预测模型来检测是否合并OpenCL内核或将它们单独调度到最合适的设备上，而无需提前分析。我们在两个不同的平台上对我们的方法进行了广泛的工作负载评估。我们在项目、工作负载和平台上始终表现出显著的性能和周转时间改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the General Purpose GPUs

自引率

0.00%

发文量