使用重用配置文件对gpgpu进行快速、准确和可扩展的内存建模

Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, Atanu Barai, N. Santhi, S. Eidenbenz
{"title":"使用重用配置文件对gpgpu进行快速、准确和可扩展的内存建模","authors":"Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, Atanu Barai, N. Santhi, S. Eidenbenz","doi":"10.1145/3392717.3392761","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce an accurate and scalable memory modeling framework for General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance Prediction Tool-Kit for GPUs Cache Memories. PPT-GPU-Mem predicts the performance of different GPUs' cache memory hierarchy (L1 & L2) based on reuse profiles. We extract a memory trace for each GPU kernel once in its lifetime using the recently released binary instrumentation tool, NVBIT. The memory trace extraction is architecture-independent and can be done on any available NVIDIA GPU. PPT-GPU-Mem can then model any NVIDIA GPU caches given their parameters and the extracted memory trace. We model Volta Tesla V100 and Turing TITAN RTX and validate our framework using different kernels from Polybench and Rodinia benchmark suites in addition to two deep learning applications from Tango DNN benchmark suite. We provide two models, MBRDP (Multiple Block Reuse Distance Profile) and OBRDP (One Block Reuse Distance Profile), with varying assumptions, accuracy, and speed. Our accuracy ranges from 92% to 99% for the different cache levels compared to real hardware while maintaining the scalability in producing the results. Finally, we illustrate that PPT-GPU-Mem can be used for design space exploration and for predicting the cache performance of future GPUs.","PeriodicalId":346687,"journal":{"name":"Proceedings of the 34th ACM International Conference on Supercomputing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles\",\"authors\":\"Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, Atanu Barai, N. Santhi, S. Eidenbenz\",\"doi\":\"10.1145/3392717.3392761\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we introduce an accurate and scalable memory modeling framework for General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance Prediction Tool-Kit for GPUs Cache Memories. PPT-GPU-Mem predicts the performance of different GPUs' cache memory hierarchy (L1 & L2) based on reuse profiles. We extract a memory trace for each GPU kernel once in its lifetime using the recently released binary instrumentation tool, NVBIT. The memory trace extraction is architecture-independent and can be done on any available NVIDIA GPU. PPT-GPU-Mem can then model any NVIDIA GPU caches given their parameters and the extracted memory trace. We model Volta Tesla V100 and Turing TITAN RTX and validate our framework using different kernels from Polybench and Rodinia benchmark suites in addition to two deep learning applications from Tango DNN benchmark suite. We provide two models, MBRDP (Multiple Block Reuse Distance Profile) and OBRDP (One Block Reuse Distance Profile), with varying assumptions, accuracy, and speed. Our accuracy ranges from 92% to 99% for the different cache levels compared to real hardware while maintaining the scalability in producing the results. Finally, we illustrate that PPT-GPU-Mem can be used for design space exploration and for predicting the cache performance of future GPUs.\",\"PeriodicalId\":346687,\"journal\":{\"name\":\"Proceedings of the 34th ACM International Conference on Supercomputing\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 34th ACM International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3392717.3392761\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th ACM International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3392717.3392761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

本文介绍了一种用于通用图形处理器(gpgpu)的精确且可扩展的内存建模框架,即PPT-GPU-Mem。这就是gpu缓存内存的性能预测工具包。pt - gpu - mem基于重用配置文件预测不同gpu缓存层次(L1和L2)的性能。我们使用最近发布的二进制检测工具NVBIT在每个GPU内核的生命周期中提取一次内存跟踪。内存跟踪提取与体系结构无关,可以在任何可用的NVIDIA GPU上完成。然后,pt -GPU- mem可以模拟任何NVIDIA GPU缓存,给定它们的参数和提取的内存跟踪。我们对Volta Tesla V100和Turing TITAN RTX进行了建模,并使用来自Polybench和Rodinia基准套件的不同内核以及来自Tango DNN基准套件的两个深度学习应用程序验证了我们的框架。我们提供了两个模型,MBRDP(多块重用距离配置文件)和OBRDP(一个块重用距离配置文件),具有不同的假设,精度和速度。与实际硬件相比,对于不同的缓存级别,我们的准确度在92%到99%之间,同时保持了生成结果的可伸缩性。最后,我们说明了pt - gpu - mem可以用于设计空间探索和预测未来gpu的缓存性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles
In this paper, we introduce an accurate and scalable memory modeling framework for General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance Prediction Tool-Kit for GPUs Cache Memories. PPT-GPU-Mem predicts the performance of different GPUs' cache memory hierarchy (L1 & L2) based on reuse profiles. We extract a memory trace for each GPU kernel once in its lifetime using the recently released binary instrumentation tool, NVBIT. The memory trace extraction is architecture-independent and can be done on any available NVIDIA GPU. PPT-GPU-Mem can then model any NVIDIA GPU caches given their parameters and the extracted memory trace. We model Volta Tesla V100 and Turing TITAN RTX and validate our framework using different kernels from Polybench and Rodinia benchmark suites in addition to two deep learning applications from Tango DNN benchmark suite. We provide two models, MBRDP (Multiple Block Reuse Distance Profile) and OBRDP (One Block Reuse Distance Profile), with varying assumptions, accuracy, and speed. Our accuracy ranges from 92% to 99% for the different cache levels compared to real hardware while maintaining the scalability in producing the results. Finally, we illustrate that PPT-GPU-Mem can be used for design space exploration and for predicting the cache performance of future GPUs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信