{"title":"基于缓存缺失率的内存级并行机制模型","authors":"Qin Wang, Kecheng Ji, Ming Ling, Longxing Shi","doi":"10.1109/PACRIM.2017.8121918","DOIUrl":null,"url":null,"abstract":"Non-blocking caches, which are commonly utilized in modern out-of-order processors, could handle multiple outstanding memory requests simultaneously to reduce the penalties of long latency cache misses. Memory level parallelism (MLP), which refers to the number of memory requests concurrently held by Miss Status Handling Registers (MSHRs), is an indispensable factor to estimate cache performance. To achieve MLP efficiently, previous researches oversimplified the factors that need to be considered when constructing analytical models, especially for the influences of cache miss rate. By quantifying above cache miss rate effects, this paper proposes a mechanistic model of memory level parallelism, which performs more accurate than existing works. 15 benchmarks, chosen from Mobybench 2.0, Mibench 1.0 and MediaBench II, are adopted for evaluating the accuracy of our model. Compared to Gem5 cycle-accurate simulation results, the largest root mean square error is less than 11%, while the average one is around 7%. Meanwhile, the cache performance forecasting process can be sped up about 38 times compared to the Gem5 cycle-accurate simulations.","PeriodicalId":308087,"journal":{"name":"2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A mechanistic model of memory level parallelism fed with cache miss rates\",\"authors\":\"Qin Wang, Kecheng Ji, Ming Ling, Longxing Shi\",\"doi\":\"10.1109/PACRIM.2017.8121918\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-blocking caches, which are commonly utilized in modern out-of-order processors, could handle multiple outstanding memory requests simultaneously to reduce the penalties of long latency cache misses. Memory level parallelism (MLP), which refers to the number of memory requests concurrently held by Miss Status Handling Registers (MSHRs), is an indispensable factor to estimate cache performance. To achieve MLP efficiently, previous researches oversimplified the factors that need to be considered when constructing analytical models, especially for the influences of cache miss rate. By quantifying above cache miss rate effects, this paper proposes a mechanistic model of memory level parallelism, which performs more accurate than existing works. 15 benchmarks, chosen from Mobybench 2.0, Mibench 1.0 and MediaBench II, are adopted for evaluating the accuracy of our model. Compared to Gem5 cycle-accurate simulation results, the largest root mean square error is less than 11%, while the average one is around 7%. Meanwhile, the cache performance forecasting process can be sped up about 38 times compared to the Gem5 cycle-accurate simulations.\",\"PeriodicalId\":308087,\"journal\":{\"name\":\"2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACRIM.2017.8121918\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACRIM.2017.8121918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
摘要
非阻塞缓存通常用于现代乱序处理器,它可以同时处理多个未完成的内存请求,以减少长延迟缓存丢失的损失。内存级并行性(MLP)是指由Miss Status Handling Registers (MSHRs)并发持有的内存请求的数量,是评估缓存性能不可或缺的一个因素。为了高效地实现MLP,以往的研究在构建分析模型时过分简化了需要考虑的因素,特别是缓存缺失率的影响。通过量化上述缓存缺失率效应,本文提出了一种比现有研究更准确的内存级并行机制模型。采用了从mobabbench 2.0、Mibench 1.0和mediabbench II中选择的15个基准来评估我们模型的准确性。与Gem5周期精度仿真结果相比,最大均方根误差小于11%,平均误差在7%左右。同时,与Gem5周期精确模拟相比,缓存性能预测过程可以加快约38倍。
A mechanistic model of memory level parallelism fed with cache miss rates
Non-blocking caches, which are commonly utilized in modern out-of-order processors, could handle multiple outstanding memory requests simultaneously to reduce the penalties of long latency cache misses. Memory level parallelism (MLP), which refers to the number of memory requests concurrently held by Miss Status Handling Registers (MSHRs), is an indispensable factor to estimate cache performance. To achieve MLP efficiently, previous researches oversimplified the factors that need to be considered when constructing analytical models, especially for the influences of cache miss rate. By quantifying above cache miss rate effects, this paper proposes a mechanistic model of memory level parallelism, which performs more accurate than existing works. 15 benchmarks, chosen from Mobybench 2.0, Mibench 1.0 and MediaBench II, are adopted for evaluating the accuracy of our model. Compared to Gem5 cycle-accurate simulation results, the largest root mean square error is less than 11%, while the average one is around 7%. Meanwhile, the cache performance forecasting process can be sped up about 38 times compared to the Gem5 cycle-accurate simulations.