使用重用配置文件对gpgpu进行快速、准确和可扩展的内存建模

Proceedings of the 34th ACM International Conference on Supercomputing Pub Date : 2020-06-29 DOI:10.1145/3392717.3392761

Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, Atanu Barai, N. Santhi, S. Eidenbenz

{"title":"使用重用配置文件对gpgpu进行快速、准确和可扩展的内存建模","authors":"Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, Atanu Barai, N. Santhi, S. Eidenbenz","doi":"10.1145/3392717.3392761","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce an accurate and scalable memory modeling framework for General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance Prediction Tool-Kit for GPUs Cache Memories. PPT-GPU-Mem predicts the performance of different GPUs' cache memory hierarchy (L1 & L2) based on reuse profiles. We extract a memory trace for each GPU kernel once in its lifetime using the recently released binary instrumentation tool, NVBIT. The memory trace extraction is architecture-independent and can be done on any available NVIDIA GPU. PPT-GPU-Mem can then model any NVIDIA GPU caches given their parameters and the extracted memory trace. We model Volta Tesla V100 and Turing TITAN RTX and validate our framework using different kernels from Polybench and Rodinia benchmark suites in addition to two deep learning applications from Tango DNN benchmark suite. We provide two models, MBRDP (Multiple Block Reuse Distance Profile) and OBRDP (One Block Reuse Distance Profile), with varying assumptions, accuracy, and speed. Our accuracy ranges from 92% to 99% for the different cache levels compared to real hardware while maintaining the scalability in producing the results. Finally, we illustrate that PPT-GPU-Mem can be used for design space exploration and for predicting the cache performance of future GPUs.","PeriodicalId":346687,"journal":{"name":"Proceedings of the 34th ACM International Conference on Supercomputing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles\",\"authors\":\"Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, Atanu Barai, N. Santhi, S. Eidenbenz\",\"doi\":\"10.1145/3392717.3392761\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we introduce an accurate and scalable memory modeling framework for General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance Prediction Tool-Kit for GPUs Cache Memories. PPT-GPU-Mem predicts the performance of different GPUs' cache memory hierarchy (L1 & L2) based on reuse profiles. We extract a memory trace for each GPU kernel once in its lifetime using the recently released binary instrumentation tool, NVBIT. The memory trace extraction is architecture-independent and can be done on any available NVIDIA GPU. PPT-GPU-Mem can then model any NVIDIA GPU caches given their parameters and the extracted memory trace. We model Volta Tesla V100 and Turing TITAN RTX and validate our framework using different kernels from Polybench and Rodinia benchmark suites in addition to two deep learning applications from Tango DNN benchmark suite. We provide two models, MBRDP (Multiple Block Reuse Distance Profile) and OBRDP (One Block Reuse Distance Profile), with varying assumptions, accuracy, and speed. Our accuracy ranges from 92% to 99% for the different cache levels compared to real hardware while maintaining the scalability in producing the results. Finally, we illustrate that PPT-GPU-Mem can be used for design space exploration and for predicting the cache performance of future GPUs.\",\"PeriodicalId\":346687,\"journal\":{\"name\":\"Proceedings of the 34th ACM International Conference on Supercomputing\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 34th ACM International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3392717.3392761\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th ACM International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3392717.3392761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

本文介绍了一种用于通用图形处理器(gpgpu)的精确且可扩展的内存建模框架，即PPT-GPU-Mem。这就是gpu缓存内存的性能预测工具包。pt - gpu - mem基于重用配置文件预测不同gpu缓存层次(L1和L2)的性能。我们使用最近发布的二进制检测工具NVBIT在每个GPU内核的生命周期中提取一次内存跟踪。内存跟踪提取与体系结构无关，可以在任何可用的NVIDIA GPU上完成。然后，pt -GPU- mem可以模拟任何NVIDIA GPU缓存，给定它们的参数和提取的内存跟踪。我们对Volta Tesla V100和Turing TITAN RTX进行了建模，并使用来自Polybench和Rodinia基准套件的不同内核以及来自Tango DNN基准套件的两个深度学习应用程序验证了我们的框架。我们提供了两个模型，MBRDP(多块重用距离配置文件)和OBRDP(一个块重用距离配置文件)，具有不同的假设，精度和速度。与实际硬件相比，对于不同的缓存级别，我们的准确度在92%到99%之间，同时保持了生成结果的可伸缩性。最后，我们说明了pt - gpu - mem可以用于设计空间探索和预测未来gpu的缓存性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles

In this paper, we introduce an accurate and scalable memory modeling framework for General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance Prediction Tool-Kit for GPUs Cache Memories. PPT-GPU-Mem predicts the performance of different GPUs' cache memory hierarchy (L1 & L2) based on reuse profiles. We extract a memory trace for each GPU kernel once in its lifetime using the recently released binary instrumentation tool, NVBIT. The memory trace extraction is architecture-independent and can be done on any available NVIDIA GPU. PPT-GPU-Mem can then model any NVIDIA GPU caches given their parameters and the extracted memory trace. We model Volta Tesla V100 and Turing TITAN RTX and validate our framework using different kernels from Polybench and Rodinia benchmark suites in addition to two deep learning applications from Tango DNN benchmark suite. We provide two models, MBRDP (Multiple Block Reuse Distance Profile) and OBRDP (One Block Reuse Distance Profile), with varying assumptions, accuracy, and speed. Our accuracy ranges from 92% to 99% for the different cache levels compared to real hardware while maintaining the scalability in producing the results. Finally, we illustrate that PPT-GPU-Mem can be used for design space exploration and for predicting the cache performance of future GPUs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 34th ACM International Conference on Supercomputing

自引率

0.00%

发文量