Cache Capacity Aware Thread Scheduling for Irregular Memory Access on many-core GPGPUs

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2013-04-29 DOI:10.1109/ASPDAC.2013.6509618

Hsien-Kai Kuo, Ta-Kan Yen, B. Lai, Jing-Yang Jou

{"title":"Cache Capacity Aware Thread Scheduling for Irregular Memory Access on many-core GPGPUs","authors":"Hsien-Kai Kuo, Ta-Kan Yen, B. Lai, Jing-Yang Jou","doi":"10.1109/ASPDAC.2013.6509618","DOIUrl":null,"url":null,"abstract":"On-chip shared cache is effective to alleviate the memory bottleneck in modern many-core systems, such as GPGPUs. However, when scheduling numerous concurrent threads on a GPGPU, a cache capacity agnostic scheduling scheme could lead to severe cache contention among threads and thus significant performance degradation. Moreover, the diverse working sets in irregular applications make the cache contention issue an even more serious problem. As a result, taking cache capacity into account has become a critical scheduling issue of GPGPUs. This paper formulates a Cache Capacity Aware Thread Scheduling Problem to capture the impact of cache capacity as well as different architectural considerations. With a proof to be NP-hard, this paper has proposed two algorithms to perform the cache capacity aware thread scheduling. The simulation results on Nvidia's Fermi configuration have shown that the proposed scheduling scheme can effectively avoid cache contention, and achieve an average of 44.7% cache miss reduction and 28.5% runtime enhancement. The paper also shows the runtime can be enhanced up to 62.5% for more complex applications.","PeriodicalId":297528,"journal":{"name":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASPDAC.2013.6509618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

On-chip shared cache is effective to alleviate the memory bottleneck in modern many-core systems, such as GPGPUs. However, when scheduling numerous concurrent threads on a GPGPU, a cache capacity agnostic scheduling scheme could lead to severe cache contention among threads and thus significant performance degradation. Moreover, the diverse working sets in irregular applications make the cache contention issue an even more serious problem. As a result, taking cache capacity into account has become a critical scheduling issue of GPGPUs. This paper formulates a Cache Capacity Aware Thread Scheduling Problem to capture the impact of cache capacity as well as different architectural considerations. With a proof to be NP-hard, this paper has proposed two algorithms to perform the cache capacity aware thread scheduling. The simulation results on Nvidia's Fermi configuration have shown that the proposed scheduling scheme can effectively avoid cache contention, and achieve an average of 44.7% cache miss reduction and 28.5% runtime enhancement. The paper also shows the runtime can be enhanced up to 62.5% for more complex applications.

查看原文本刊更多论文

多核gpgpu非规则内存访问的缓存容量感知线程调度

片上共享缓存是缓解现代多核系统(如gpgpu)内存瓶颈的有效方法。然而，当调度GPGPU上的多个并发线程时，与缓存容量无关的调度方案可能导致线程之间严重的缓存争用，从而导致显著的性能下降。此外，不规则应用程序中的各种工作集使缓存争用问题变得更加严重。因此，考虑缓存容量已成为gpgpu的一个关键调度问题。本文提出了一个缓存容量感知线程调度问题，以捕获缓存容量的影响以及不同的体系结构考虑因素。在证明了NP-hard算法的基础上，提出了两种基于缓存容量感知的线程调度算法。在Nvidia的Fermi配置上的仿真结果表明，该调度方案可以有效避免缓存争用，平均减少44.7%的缓存缺失，提高28.5%的运行时间。这篇论文还表明，对于更复杂的应用程序，运行时可以提高62.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC)

自引率

0.00%

发文量