集成CPU+GPU处理器的调度挑战与机遇

K. Dev, S. Reda
{"title":"集成CPU+GPU处理器的调度挑战与机遇","authors":"K. Dev, S. Reda","doi":"10.1145/2993452.2994307","DOIUrl":null,"url":null,"abstract":"Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping. In this paper we first provide detailed infrared imaging results that show the impact of mapping decisions on the thermal and power profiles of CPU+GPU processors. Furthermore, we observe that runtime conditions such as power and CPU load from traditional workloads also affect the mapping decision. To exploit our observations, we propose techniques to characterize the OpenCL kernel workloads during run-time and map them on appropriate device under time-varying physical (i.e., chip power limit) and CPU load conditions, in particular the number of available CPU cores for the OpenCL kernel. We implement our dynamic scheduler on a real CPU+GPU processor and evaluate it using various OpenCL benchmarks. Compared to the state-ofthe- art kernel-level scheduling method, the proposed scheduler provides up to 31% and 10% improvements in runtime and energy, respectively.","PeriodicalId":198459,"journal":{"name":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Scheduling challenges and opportunities in integrated CPU+GPU processors\",\"authors\":\"K. Dev, S. Reda\",\"doi\":\"10.1145/2993452.2994307\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping. In this paper we first provide detailed infrared imaging results that show the impact of mapping decisions on the thermal and power profiles of CPU+GPU processors. Furthermore, we observe that runtime conditions such as power and CPU load from traditional workloads also affect the mapping decision. To exploit our observations, we propose techniques to characterize the OpenCL kernel workloads during run-time and map them on appropriate device under time-varying physical (i.e., chip power limit) and CPU load conditions, in particular the number of available CPU cores for the OpenCL kernel. We implement our dynamic scheduler on a real CPU+GPU processor and evaluate it using various OpenCL benchmarks. Compared to the state-ofthe- art kernel-level scheduling method, the proposed scheduler provides up to 31% and 10% improvements in runtime and energy, respectively.\",\"PeriodicalId\":198459,\"journal\":{\"name\":\"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)\",\"volume\":\"122 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2993452.2994307\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2993452.2994307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

摘要

在同一芯片上集成不同架构器件(CPU和GPU)的异构处理器为各种工作负载提供了良好的性能和能源效率。然而,就在适当的设备上调度工作负载而言,它们也带来了挑战和机遇。当前的调度实践主要使用内核工作负载的特征来决定CPU/GPU的映射。在本文中,我们首先提供了详细的红外成像结果,显示了映射决策对CPU+GPU处理器的热和功耗配置文件的影响。此外,我们观察到运行时条件(如传统工作负载的功率和CPU负载)也会影响映射决策。为了利用我们的观察结果,我们提出了在运行期间表征OpenCL内核工作负载的技术,并在时变的物理(即芯片功率限制)和CPU负载条件下将它们映射到适当的设备上,特别是OpenCL内核可用的CPU内核数量。我们在一个真正的CPU+GPU处理器上实现了动态调度器,并使用各种OpenCL基准测试对其进行了评估。与最先进的内核级调度方法相比,所提出的调度程序在运行时间和能源方面分别提供了31%和10%的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Scheduling challenges and opportunities in integrated CPU+GPU processors
Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping. In this paper we first provide detailed infrared imaging results that show the impact of mapping decisions on the thermal and power profiles of CPU+GPU processors. Furthermore, we observe that runtime conditions such as power and CPU load from traditional workloads also affect the mapping decision. To exploit our observations, we propose techniques to characterize the OpenCL kernel workloads during run-time and map them on appropriate device under time-varying physical (i.e., chip power limit) and CPU load conditions, in particular the number of available CPU cores for the OpenCL kernel. We implement our dynamic scheduler on a real CPU+GPU processor and evaluate it using various OpenCL benchmarks. Compared to the state-ofthe- art kernel-level scheduling method, the proposed scheduler provides up to 31% and 10% improvements in runtime and energy, respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信