基于缓存缺失率的内存级并行机制模型

2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) Pub Date : 2017-08-01 DOI:10.1109/PACRIM.2017.8121918

Qin Wang, Kecheng Ji, Ming Ling, Longxing Shi

{"title":"基于缓存缺失率的内存级并行机制模型","authors":"Qin Wang, Kecheng Ji, Ming Ling, Longxing Shi","doi":"10.1109/PACRIM.2017.8121918","DOIUrl":null,"url":null,"abstract":"Non-blocking caches, which are commonly utilized in modern out-of-order processors, could handle multiple outstanding memory requests simultaneously to reduce the penalties of long latency cache misses. Memory level parallelism (MLP), which refers to the number of memory requests concurrently held by Miss Status Handling Registers (MSHRs), is an indispensable factor to estimate cache performance. To achieve MLP efficiently, previous researches oversimplified the factors that need to be considered when constructing analytical models, especially for the influences of cache miss rate. By quantifying above cache miss rate effects, this paper proposes a mechanistic model of memory level parallelism, which performs more accurate than existing works. 15 benchmarks, chosen from Mobybench 2.0, Mibench 1.0 and MediaBench II, are adopted for evaluating the accuracy of our model. Compared to Gem5 cycle-accurate simulation results, the largest root mean square error is less than 11%, while the average one is around 7%. Meanwhile, the cache performance forecasting process can be sped up about 38 times compared to the Gem5 cycle-accurate simulations.","PeriodicalId":308087,"journal":{"name":"2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A mechanistic model of memory level parallelism fed with cache miss rates\",\"authors\":\"Qin Wang, Kecheng Ji, Ming Ling, Longxing Shi\",\"doi\":\"10.1109/PACRIM.2017.8121918\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-blocking caches, which are commonly utilized in modern out-of-order processors, could handle multiple outstanding memory requests simultaneously to reduce the penalties of long latency cache misses. Memory level parallelism (MLP), which refers to the number of memory requests concurrently held by Miss Status Handling Registers (MSHRs), is an indispensable factor to estimate cache performance. To achieve MLP efficiently, previous researches oversimplified the factors that need to be considered when constructing analytical models, especially for the influences of cache miss rate. By quantifying above cache miss rate effects, this paper proposes a mechanistic model of memory level parallelism, which performs more accurate than existing works. 15 benchmarks, chosen from Mobybench 2.0, Mibench 1.0 and MediaBench II, are adopted for evaluating the accuracy of our model. Compared to Gem5 cycle-accurate simulation results, the largest root mean square error is less than 11%, while the average one is around 7%. Meanwhile, the cache performance forecasting process can be sped up about 38 times compared to the Gem5 cycle-accurate simulations.\",\"PeriodicalId\":308087,\"journal\":{\"name\":\"2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACRIM.2017.8121918\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACRIM.2017.8121918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

非阻塞缓存通常用于现代乱序处理器，它可以同时处理多个未完成的内存请求，以减少长延迟缓存丢失的损失。内存级并行性(MLP)是指由Miss Status Handling Registers (MSHRs)并发持有的内存请求的数量，是评估缓存性能不可或缺的一个因素。为了高效地实现MLP，以往的研究在构建分析模型时过分简化了需要考虑的因素，特别是缓存缺失率的影响。通过量化上述缓存缺失率效应，本文提出了一种比现有研究更准确的内存级并行机制模型。采用了从mobabbench 2.0、Mibench 1.0和mediabbench II中选择的15个基准来评估我们模型的准确性。与Gem5周期精度仿真结果相比，最大均方根误差小于11%，平均误差在7%左右。同时，与Gem5周期精确模拟相比，缓存性能预测过程可以加快约38倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A mechanistic model of memory level parallelism fed with cache miss rates

Non-blocking caches, which are commonly utilized in modern out-of-order processors, could handle multiple outstanding memory requests simultaneously to reduce the penalties of long latency cache misses. Memory level parallelism (MLP), which refers to the number of memory requests concurrently held by Miss Status Handling Registers (MSHRs), is an indispensable factor to estimate cache performance. To achieve MLP efficiently, previous researches oversimplified the factors that need to be considered when constructing analytical models, especially for the influences of cache miss rate. By quantifying above cache miss rate effects, this paper proposes a mechanistic model of memory level parallelism, which performs more accurate than existing works. 15 benchmarks, chosen from Mobybench 2.0, Mibench 1.0 and MediaBench II, are adopted for evaluating the accuracy of our model. Compared to Gem5 cycle-accurate simulation results, the largest root mean square error is less than 11%, while the average one is around 7%. Meanwhile, the cache performance forecasting process can be sped up about 38 times compared to the Gem5 cycle-accurate simulations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)

自引率

0.00%

发文量