面向高吞吐量协同异构计算的内存调度

Hao Wang, Ripudaman Singh, M. Schulte, N. Kim
{"title":"面向高吞吐量协同异构计算的内存调度","authors":"Hao Wang, Ripudaman Singh, M. Schulte, N. Kim","doi":"10.1145/2628071.2628096","DOIUrl":null,"url":null,"abstract":"Technology scaling enables the integration of both the CPU and the GPU into a single chip for higher throughput and energy efficiency. In such a single-chip heterogeneous processor (SCHP), its memory bandwidth is the most critically shared resource, requiring judicious management to maximize the throughput. Previous studies on memory scheduling for SCHPs have focused on the scenario where multiple applications are running on the CPU and the GPU respectively, which we denote as a multitasking scenario. However, another increasingly important usage scenario for SCHPs is cooperative heterogeneous computing, where a single parallel application is partitioned between the CPU and the GPU such that the overall throughput is maximized. In previous studies on memory scheduling techniques for chip multi-processors (CMPs) and SCHPs, the first-ready first-come-first-service (FR-FCFS) scheduling policy was used as an inept baseline due to its fairness issue. However, in a cooperative heterogeneous computing scenario, we first demonstrate that FR-FCFS actually offers nearly 10% higher throughput than two recently proposed memory scheduling techniques designed for a multi-tasking scenario. Second, based on our analysis on memory access characteristics in a cooperative heterogeneous computing scenario, we propose various optimization techniques that enhance the row-buffer locality by 10%, reduce the service latency of CPU memory requests by 26%, and improve the overall throughput by up to 8% compared to FR-FCFS.","PeriodicalId":263670,"journal":{"name":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Memory scheduling towards high-throughput cooperative heterogeneous computing\",\"authors\":\"Hao Wang, Ripudaman Singh, M. Schulte, N. Kim\",\"doi\":\"10.1145/2628071.2628096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Technology scaling enables the integration of both the CPU and the GPU into a single chip for higher throughput and energy efficiency. In such a single-chip heterogeneous processor (SCHP), its memory bandwidth is the most critically shared resource, requiring judicious management to maximize the throughput. Previous studies on memory scheduling for SCHPs have focused on the scenario where multiple applications are running on the CPU and the GPU respectively, which we denote as a multitasking scenario. However, another increasingly important usage scenario for SCHPs is cooperative heterogeneous computing, where a single parallel application is partitioned between the CPU and the GPU such that the overall throughput is maximized. In previous studies on memory scheduling techniques for chip multi-processors (CMPs) and SCHPs, the first-ready first-come-first-service (FR-FCFS) scheduling policy was used as an inept baseline due to its fairness issue. However, in a cooperative heterogeneous computing scenario, we first demonstrate that FR-FCFS actually offers nearly 10% higher throughput than two recently proposed memory scheduling techniques designed for a multi-tasking scenario. Second, based on our analysis on memory access characteristics in a cooperative heterogeneous computing scenario, we propose various optimization techniques that enhance the row-buffer locality by 10%, reduce the service latency of CPU memory requests by 26%, and improve the overall throughput by up to 8% compared to FR-FCFS.\",\"PeriodicalId\":263670,\"journal\":{\"name\":\"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2628071.2628096\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 23rd International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2628071.2628096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20

摘要

技术扩展使CPU和GPU集成到单个芯片中,以实现更高的吞吐量和能源效率。在这种单芯片异构处理器(SCHP)中,其内存带宽是最关键的共享资源,需要明智的管理以实现吞吐量最大化。先前关于schp内存调度的研究主要集中在多个应用程序分别在CPU和GPU上运行的场景,我们将其称为多任务场景。然而,schp的另一个日益重要的使用场景是协同异构计算,其中单个并行应用程序在CPU和GPU之间进行分区,从而使总体吞吐量最大化。在以往关于芯片多处理器(cmp)和芯片多处理器(schp)内存调度技术的研究中,由于其公平性问题,先准备先到先服务(FR-FCFS)调度策略被用作不合适的基准。然而,在协作异构计算场景中,我们首先证明了FR-FCFS实际上比最近提出的两种针对多任务场景设计的内存调度技术提供了近10%的高吞吐量。其次,基于我们对协同异构计算场景下内存访问特性的分析,我们提出了各种优化技术,与FR-FCFS相比,这些优化技术将行缓冲局域性提高了10%,将CPU内存请求的服务延迟降低了26%,并将总体吞吐量提高了8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Memory scheduling towards high-throughput cooperative heterogeneous computing
Technology scaling enables the integration of both the CPU and the GPU into a single chip for higher throughput and energy efficiency. In such a single-chip heterogeneous processor (SCHP), its memory bandwidth is the most critically shared resource, requiring judicious management to maximize the throughput. Previous studies on memory scheduling for SCHPs have focused on the scenario where multiple applications are running on the CPU and the GPU respectively, which we denote as a multitasking scenario. However, another increasingly important usage scenario for SCHPs is cooperative heterogeneous computing, where a single parallel application is partitioned between the CPU and the GPU such that the overall throughput is maximized. In previous studies on memory scheduling techniques for chip multi-processors (CMPs) and SCHPs, the first-ready first-come-first-service (FR-FCFS) scheduling policy was used as an inept baseline due to its fairness issue. However, in a cooperative heterogeneous computing scenario, we first demonstrate that FR-FCFS actually offers nearly 10% higher throughput than two recently proposed memory scheduling techniques designed for a multi-tasking scenario. Second, based on our analysis on memory access characteristics in a cooperative heterogeneous computing scenario, we propose various optimization techniques that enhance the row-buffer locality by 10%, reduce the service latency of CPU memory requests by 26%, and improve the overall throughput by up to 8% compared to FR-FCFS.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信