Accuracy Improvement of Memory System Simulation for Modern Shared Memory Processor

Yuetsu Kodama, Tetsuya Odajima, A. Asato, M. Sato
{"title":"Accuracy Improvement of Memory System Simulation for Modern Shared Memory Processor","authors":"Yuetsu Kodama, Tetsuya Odajima, A. Asato, M. Sato","doi":"10.1145/3368474.3368483","DOIUrl":null,"url":null,"abstract":"For the purpose of developing applications for supercomputer Fugaku at an early stage, RIKEN has developed a processor simulator. This simulator is based on the general-purpose processor simulator gem5. It does not simulate the actual hardware of a Fugaku processor. However, we believe that sufficient simulation accuracy can be obtained since it simulates the instruction pipeline of out-of-order execution with cycle-level accuracy along with performing detailed parameter tuning of out-of-order resources. In order to estimate the accurate execution time of a program, it is necessary to simulate with accuracy not only the instruction execution time, but also the access time of the cache memory hierarchy. Therefore, in the RIKEN simulator, we expanded gem5 to match the performance of the cache memory hierarchy to that of a Fugaku processor. In this simulator, we aim to estimate the execution cycles of one node application on a Fugaku processor with accuracy that enables relative evaluation and application tuning. In this paper, we show the details of the implementation of this simulator and verify its accuracy compared with that of a Fugaku processor test chip. In the evaluation of the total 46 kernel benchmarks, it was confirmed that the difference is 13% or less for 85% of the kernels. In the multithreaded execution of Stream Triad benchmark, scalable performance according to the number of threads was confirmed, and achieved over 80% of memory throughput with enough accuracy.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3368474.3368483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

For the purpose of developing applications for supercomputer Fugaku at an early stage, RIKEN has developed a processor simulator. This simulator is based on the general-purpose processor simulator gem5. It does not simulate the actual hardware of a Fugaku processor. However, we believe that sufficient simulation accuracy can be obtained since it simulates the instruction pipeline of out-of-order execution with cycle-level accuracy along with performing detailed parameter tuning of out-of-order resources. In order to estimate the accurate execution time of a program, it is necessary to simulate with accuracy not only the instruction execution time, but also the access time of the cache memory hierarchy. Therefore, in the RIKEN simulator, we expanded gem5 to match the performance of the cache memory hierarchy to that of a Fugaku processor. In this simulator, we aim to estimate the execution cycles of one node application on a Fugaku processor with accuracy that enables relative evaluation and application tuning. In this paper, we show the details of the implementation of this simulator and verify its accuracy compared with that of a Fugaku processor test chip. In the evaluation of the total 46 kernel benchmarks, it was confirmed that the difference is 13% or less for 85% of the kernels. In the multithreaded execution of Stream Triad benchmark, scalable performance according to the number of threads was confirmed, and achieved over 80% of memory throughput with enough accuracy.
提高现代共享内存处理器内存系统仿真的精度
为了在早期阶段开发超级计算机 Fugaku 的应用程序,理化学研究所开发了一个处理器模拟器。该模拟器以通用处理器模拟器 gem5 为基础,并未模拟 Fugaku 处理器的实际硬件。但是,我们相信可以获得足够的仿真精度,因为它能以周期级的精度模拟顺序外执行的指令流水线,并对顺序外资源进行详细的参数调整。为了准确估算程序的执行时间,不仅需要精确模拟指令执行时间,还需要精确模拟高速缓冲存储器层次结构的访问时间。因此,在理研模拟器中,我们扩展了 gem5,使高速缓冲存储器层次结构的性能与 Fugaku 处理器的性能相匹配。在这个模拟器中,我们的目标是精确估算 Fugaku 处理器上一个节点应用程序的执行周期,以便进行相对评估和应用程序调整。在本文中,我们展示了该模拟器的实现细节,并与 Fugaku 处理器测试芯片进行了比较,验证了其准确性。在对总共 46 个内核基准的评估中,我们确认 85% 的内核差异在 13% 或以下。在多线程执行 Stream Triad 基准时,确认了根据线程数量的可扩展性能,并以足够的精度实现了 80% 以上的内存吞吐量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信