DStride: data-cache miss-address-based stride prefetching scheme for multimedia processors

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001 Pub Date : 2001-01-29 DOI:10.1109/ACAC.2001.903360

G. Hariprakash, R. Achutharaman, A. Omondi

{"title":"DStride: data-cache miss-address-based stride prefetching scheme for multimedia processors","authors":"G. Hariprakash, R. Achutharaman, A. Omondi","doi":"10.1109/ACAC.2001.903360","DOIUrl":null,"url":null,"abstract":"Prefetching reduces cache miss latency by moving data up in memory hierarchy before they are actually needed. Recent hardware-based stride prefetching techniques mostly rely on the processor pipeline information (e.g. program counter and branch prediction table) for prediction. Continuing developments in processor microarchitecture drastically change core pipeline design and require that existing hardware-based stride prefetching techniques be adapted to the evolving new processor architectures. In this paper we present a new hardware-based stride prefetching technique, called DStride, that is independent of processor pipeline design changes. In this new design, the first-level data cache miss address stream is used for the stride prediction. The miss addresses are separated into load stream and store stream to increase the efficiency of the predictor. They are checked separately against the recent miss address stream to detect the strides. The detected steady strides are maintained in a table that also performs look-ahead stride prefetching when the processor stride reference rate is higher than the prefetch request service rate. We evaluated our design with multimedia workloads using execution-driven simulation with SimpleScalar toolset. Our experiments show that DStride is very effective in reducing overall pipeline stalls due to cache miss latency, especially for stride-intensive applications such as multimedia workloads.","PeriodicalId":230403,"journal":{"name":"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001","volume":"192 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACAC.2001.903360","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Prefetching reduces cache miss latency by moving data up in memory hierarchy before they are actually needed. Recent hardware-based stride prefetching techniques mostly rely on the processor pipeline information (e.g. program counter and branch prediction table) for prediction. Continuing developments in processor microarchitecture drastically change core pipeline design and require that existing hardware-based stride prefetching techniques be adapted to the evolving new processor architectures. In this paper we present a new hardware-based stride prefetching technique, called DStride, that is independent of processor pipeline design changes. In this new design, the first-level data cache miss address stream is used for the stride prediction. The miss addresses are separated into load stream and store stream to increase the efficiency of the predictor. They are checked separately against the recent miss address stream to detect the strides. The detected steady strides are maintained in a table that also performs look-ahead stride prefetching when the processor stride reference rate is higher than the prefetch request service rate. We evaluated our design with multimedia workloads using execution-driven simulation with SimpleScalar toolset. Our experiments show that DStride is very effective in reducing overall pipeline stalls due to cache miss latency, especially for stride-intensive applications such as multimedia workloads.

查看原文本刊更多论文

DStride:多媒体处理器基于数据缓存缺失地址的跨步预取方案

预取通过在实际需要之前将数据上移到内存层次结构中来减少缓存丢失延迟。目前基于硬件的跨预取技术主要依靠处理器管道信息(如程序计数器和分支预测表)进行预取。处理器微体系结构的持续发展极大地改变了核心管道设计，并要求现有的基于硬件的跨距预取技术适应不断发展的新处理器体系结构。在本文中，我们提出了一种新的基于硬件的步幅预取技术，称为DStride，它不受处理器流水线设计变化的影响。在这个新设计中，第一级数据缓存丢失地址流被用于步长预测。miss地址被分成负载流和存储流，以提高预测器的效率。它们分别根据最近丢失的地址流进行检查，以检测跨步。检测到的稳定步幅保存在一个表中，当处理器步幅参考速率高于预取请求服务速率时，该表还执行预读步幅预取。我们使用SimpleScalar工具集的执行驱动仿真来评估我们的多媒体工作负载设计。我们的实验表明，DStride在减少由于缓存丢失延迟而导致的整体管道停滞方面非常有效，特别是对于像多媒体工作负载这样的跨步密集型应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001

自引率

0.00%

发文量