Mascar: Speeding up GPU warps by reducing memory pitstops

Ankit Sethia, D. Jamshidi, S. Mahlke
{"title":"Mascar: Speeding up GPU warps by reducing memory pitstops","authors":"Ankit Sethia, D. Jamshidi, S. Mahlke","doi":"10.1109/HPCA.2015.7056031","DOIUrl":null,"url":null,"abstract":"With the prevalence of GPUs as throughput engines for data parallel workloads, the landscape of GPU computing is changing significantly. Non-graphics workloads with high memory intensity and irregular access patterns are frequently targeted for acceleration on GPUs. While GPUs provide large numbers of compute resources, the resources needed for memory intensive workloads are more scarce. Therefore, managing access to these limited memory resources is a challenge for GPUs. We propose a novel Memory Aware Scheduling and Cache Access Re-execution (Mascar) system on GPUs tailored for better performance for memory intensive workloads. This scheme detects memory saturation and prioritizes memory requests among warps to enable better overlapping of compute and memory accesses. Furthermore, it enables limited re-execution of memory instructions to eliminate structural hazards in the memory subsystem and take advantage of cache locality in cases where requests cannot be sent to the memory due to memory saturation. Our results show that Mascar provides a 34% speedup over the baseline round-robin scheduler and 10% speedup over the state of the art warp schedulers for memory intensive workloads. Mascar also achieves an average of 12% savings in energy for such workloads.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"16 1","pages":"174-185"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"78","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2015.7056031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 78

Abstract

With the prevalence of GPUs as throughput engines for data parallel workloads, the landscape of GPU computing is changing significantly. Non-graphics workloads with high memory intensity and irregular access patterns are frequently targeted for acceleration on GPUs. While GPUs provide large numbers of compute resources, the resources needed for memory intensive workloads are more scarce. Therefore, managing access to these limited memory resources is a challenge for GPUs. We propose a novel Memory Aware Scheduling and Cache Access Re-execution (Mascar) system on GPUs tailored for better performance for memory intensive workloads. This scheme detects memory saturation and prioritizes memory requests among warps to enable better overlapping of compute and memory accesses. Furthermore, it enables limited re-execution of memory instructions to eliminate structural hazards in the memory subsystem and take advantage of cache locality in cases where requests cannot be sent to the memory due to memory saturation. Our results show that Mascar provides a 34% speedup over the baseline round-robin scheduler and 10% speedup over the state of the art warp schedulers for memory intensive workloads. Mascar also achieves an average of 12% savings in energy for such workloads.
马斯卡:通过减少内存站来加速GPU扭曲
随着GPU作为数据并行工作负载的吞吐量引擎的普及,GPU计算的前景正在发生重大变化。具有高内存强度和不规则访问模式的非图形工作负载通常以gpu上的加速为目标。虽然gpu提供了大量的计算资源,但内存密集型工作负载所需的资源更加稀缺。因此,管理对这些有限内存资源的访问对gpu来说是一个挑战。我们提出了一种新的内存感知调度和缓存访问重执行(Mascar)系统在gpu量身定制的内存密集型工作负载更好的性能。该方案检测内存饱和,并在warp之间优先考虑内存请求,以实现更好的计算和内存访问重叠。此外,它允许有限地重新执行内存指令,以消除内存子系统中的结构性危险,并在由于内存饱和而无法将请求发送到内存的情况下利用缓存局部性。我们的结果表明,对于内存密集型工作负载,Mascar比基准轮询调度器提供了34%的加速,比最先进的warp调度器提供了10%的加速。对于这样的工作负载,Mascar还平均节省了12%的能源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信