A Simple Cache Coherence Scheme for Integrated CPU-GPU Systems

Ardhi Wiratama Baskara Yudha, Reza Pulungan, H. Hoffmann, Yan Solihin
{"title":"A Simple Cache Coherence Scheme for Integrated CPU-GPU Systems","authors":"Ardhi Wiratama Baskara Yudha, Reza Pulungan, H. Hoffmann, Yan Solihin","doi":"10.1109/DAC18072.2020.9218664","DOIUrl":null,"url":null,"abstract":"This paper presents a novel approach to accelerate applications running on integrated CPU-GPU systems. Many integrated CPU-GPU systems use cache-coherent shared memory to communicate. For example, after CPU produces data for GPU, the GPU may pull the data into its cache when it accesses the data. In such a pull-based approach, data resides in a shared cache until the GPU accesses it, resulting in long load latency on a first GPU access to a cache line. In this work, we propose a new, push-based, coherence mechanism that explicitly exploits the CPU and GPU producer-consumer relationship by automatically moving data from CPU to GPU last-level cache. The proposed mechanism results in a dramatic reduction of the GPU L2 cache miss rate in general, and a consequent increase in overall performance. Our experiments show that the proposed scheme can increase performance by up to 37%, with typical improvements in the 5–7% range. We find that even when tested applications do not benefit from the proposed approach, their performance does not decrease with our technique. While we demonstrate how the proposed scheme can co-exist with traditional cache coherence mechanisms, we argue that it could also be used as a simpler replacement for existing protocols.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"54 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 57th ACM/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAC18072.2020.9218664","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

This paper presents a novel approach to accelerate applications running on integrated CPU-GPU systems. Many integrated CPU-GPU systems use cache-coherent shared memory to communicate. For example, after CPU produces data for GPU, the GPU may pull the data into its cache when it accesses the data. In such a pull-based approach, data resides in a shared cache until the GPU accesses it, resulting in long load latency on a first GPU access to a cache line. In this work, we propose a new, push-based, coherence mechanism that explicitly exploits the CPU and GPU producer-consumer relationship by automatically moving data from CPU to GPU last-level cache. The proposed mechanism results in a dramatic reduction of the GPU L2 cache miss rate in general, and a consequent increase in overall performance. Our experiments show that the proposed scheme can increase performance by up to 37%, with typical improvements in the 5–7% range. We find that even when tested applications do not benefit from the proposed approach, their performance does not decrease with our technique. While we demonstrate how the proposed scheme can co-exist with traditional cache coherence mechanisms, we argue that it could also be used as a simpler replacement for existing protocols.
一种用于CPU-GPU集成系统的简单缓存一致性方案
本文提出了一种加速运行在CPU-GPU集成系统上的应用程序的新方法。许多集成的CPU-GPU系统使用缓存一致的共享内存进行通信。例如,CPU为GPU生成数据后,GPU在访问数据时可能会将数据拉入缓存。在这种基于拉的方法中,数据驻留在共享缓存中,直到GPU访问它,导致第一个GPU访问缓存线的加载延迟很长。在这项工作中,我们提出了一种新的、基于推送的一致性机制,该机制通过自动将数据从CPU移动到GPU的最后一级缓存来明确地利用CPU和GPU的生产者-消费者关系。提议的机制导致GPU L2缓存丢失率的显著降低,并随之提高整体性能。我们的实验表明,提出的方案可以提高性能高达37%,典型的改进在5-7%的范围内。我们发现,即使被测试的应用程序没有从建议的方法中获益,它们的性能也不会因为我们的技术而下降。虽然我们展示了所提出的方案如何与传统的缓存一致性机制共存,但我们认为它也可以作为现有协议的更简单替代品。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信