gpu内存超额订阅管理的协同页面预取与回收

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2020-05-01 DOI:10.1109/IPDPS47924.2020.00056

Qi Yu, B. Childers, Libo Huang, Cheng Qian, Hui Guo, Zhiying Wang

{"title":"gpu内存超额订阅管理的协同页面预取与回收","authors":"Qi Yu, B. Childers, Libo Huang, Cheng Qian, Hui Guo, Zhiying Wang","doi":"10.1109/IPDPS47924.2020.00056","DOIUrl":null,"url":null,"abstract":"The adoption of unified memory and demand paging has simplified programming and eased memory management in discrete GPUs. However, long-latency page faults cause significant performance overhead. While several software-based mechanisms have been proposed to address this issue, they suffer from inefficiency when page prefetching and pre-eviction are combined. For example, a state-of-the-art page replacement policy, hierarchical page eviction (HPE), is inefficient when prefetching is enabled. Furthermore, the prefetcher semantics-aware pre-evicting policy, which pre-evicts continuous pages in bulk the way they were brought in by the prefetcher, may cause thrashing for some irregular applications.In this paper, coordinated page prefetch and eviction (CPPE) is proposed to manage memory oversubscription in GPUs with unified memory. CPPE incorporates a modified page eviction policy, MHPE, and an access pattern-aware prefetcher in a fine-grained manner: MHPE is aware of prefetch semantics and the prefetcher prefetches pages according to access patterns in eviction candidates selected by MHPE. Simulation results show that, when the GPU memory is 75% and 50% oversubscribed, CPPE achieves an average speedup of 1.56x and 1.64x (up to 10.97x) over the state-of-the-art baseline, which combines a sequential-local prefetcher and LRU pre-eviction policy. CPPE also outperforms other approaches, including Random/reserved LRU with the sequential-local prefetcher, and simply disabling prefetching under memory oversubscription.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"44 1","pages":"472-482"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Coordinated Page Prefetch and Eviction for Memory Oversubscription Management in GPUs\",\"authors\":\"Qi Yu, B. Childers, Libo Huang, Cheng Qian, Hui Guo, Zhiying Wang\",\"doi\":\"10.1109/IPDPS47924.2020.00056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The adoption of unified memory and demand paging has simplified programming and eased memory management in discrete GPUs. However, long-latency page faults cause significant performance overhead. While several software-based mechanisms have been proposed to address this issue, they suffer from inefficiency when page prefetching and pre-eviction are combined. For example, a state-of-the-art page replacement policy, hierarchical page eviction (HPE), is inefficient when prefetching is enabled. Furthermore, the prefetcher semantics-aware pre-evicting policy, which pre-evicts continuous pages in bulk the way they were brought in by the prefetcher, may cause thrashing for some irregular applications.In this paper, coordinated page prefetch and eviction (CPPE) is proposed to manage memory oversubscription in GPUs with unified memory. CPPE incorporates a modified page eviction policy, MHPE, and an access pattern-aware prefetcher in a fine-grained manner: MHPE is aware of prefetch semantics and the prefetcher prefetches pages according to access patterns in eviction candidates selected by MHPE. Simulation results show that, when the GPU memory is 75% and 50% oversubscribed, CPPE achieves an average speedup of 1.56x and 1.64x (up to 10.97x) over the state-of-the-art baseline, which combines a sequential-local prefetcher and LRU pre-eviction policy. CPPE also outperforms other approaches, including Random/reserved LRU with the sequential-local prefetcher, and simply disabling prefetching under memory oversubscription.\",\"PeriodicalId\":6805,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"44 1\",\"pages\":\"472-482\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS47924.2020.00056\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

统一内存和需求分页的采用简化了离散gpu的编程和内存管理。但是，长延迟页面错误会导致显著的性能开销。虽然已经提出了几种基于软件的机制来解决这个问题，但是当页面预取和预移除结合在一起时，它们的效率很低。例如，在启用预取时，最先进的页替换策略分层页回收(HPE)效率很低。此外，预取器语义感知的预驱逐策略(按照预取器引入的方式批量预驱逐连续页面)可能会导致一些不规则应用程序出现抖动。本文提出了一种协调页面预取与回收(CPPE)的方法来管理统一内存gpu的内存超额分配。cpppe以细粒度的方式合并了修改后的页提取策略、MHPE和访问模式感知的预取器:MHPE知道预取语义，预取器根据MHPE选择的提取候选对象中的访问模式预取页面。仿真结果表明，当GPU内存超额占用75%和50%时，CPPE在结合了顺序本地预取器和LRU预回收策略的基础上实现了1.56倍和1.64倍(最高10.97倍)的平均加速。CPPE的性能也优于其他方法，包括使用顺序本地预取器的随机/保留LRU，以及在内存超额订阅下禁用预取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Coordinated Page Prefetch and Eviction for Memory Oversubscription Management in GPUs

The adoption of unified memory and demand paging has simplified programming and eased memory management in discrete GPUs. However, long-latency page faults cause significant performance overhead. While several software-based mechanisms have been proposed to address this issue, they suffer from inefficiency when page prefetching and pre-eviction are combined. For example, a state-of-the-art page replacement policy, hierarchical page eviction (HPE), is inefficient when prefetching is enabled. Furthermore, the prefetcher semantics-aware pre-evicting policy, which pre-evicts continuous pages in bulk the way they were brought in by the prefetcher, may cause thrashing for some irregular applications.In this paper, coordinated page prefetch and eviction (CPPE) is proposed to manage memory oversubscription in GPUs with unified memory. CPPE incorporates a modified page eviction policy, MHPE, and an access pattern-aware prefetcher in a fine-grained manner: MHPE is aware of prefetch semantics and the prefetcher prefetches pages according to access patterns in eviction candidates selected by MHPE. Simulation results show that, when the GPU memory is 75% and 50% oversubscribed, CPPE achieves an average speedup of 1.56x and 1.64x (up to 10.97x) over the state-of-the-art baseline, which combines a sequential-local prefetcher and LRU pre-eviction policy. CPPE also outperforms other approaches, including Random/reserved LRU with the sequential-local prefetcher, and simply disabling prefetching under memory oversubscription.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量