两级数据预取

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI:10.1109/ICCD.2007.4601908

Fei Gao, Hanyu Cui, S. Sair

{"title":"两级数据预取","authors":"Fei Gao, Hanyu Cui, S. Sair","doi":"10.1109/ICCD.2007.4601908","DOIUrl":null,"url":null,"abstract":"Data prefetching has been shown to be an effective tool in hiding part of the latency associated with cache misses in modern processors. Traditionally, data prefetchers fetch data into a small prefetch buffer near the LI for low latency, or the L2 cache for greater coverage and less cache pollution. However, with the L1-L2 cache speed gap growing, significant performance gains can be obtained if the data pref etcher can operate as aggressively as an L2-level pref etcher but with the fast hit times of an LI-level pref etcher. In this paper, we propose a prefetching framework where an LI-level prefetcher and an L2- level prefetcher work cooperatively to reduce the average access time more than either one alone can. We evaluate several design alternatives suited to perform synergistically under different workloads. From the insight we gather from this analysis, we propose a confidence-based adaptive prefetcher that can improve prefetch efficiency significantly with judicious use of available bus bandwidth. Our results show that for certain prefetcher combinations, two- level prefetching can achieve the cumulative speedup attained from either prefetcher alone. Furthermore, when compared to other two-level prefetching models, the adaptive design provides similar speedups with appreciably less bus traffic.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"36 1","pages":"238-244"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Two-level ata prefetching\",\"authors\":\"Fei Gao, Hanyu Cui, S. Sair\",\"doi\":\"10.1109/ICCD.2007.4601908\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data prefetching has been shown to be an effective tool in hiding part of the latency associated with cache misses in modern processors. Traditionally, data prefetchers fetch data into a small prefetch buffer near the LI for low latency, or the L2 cache for greater coverage and less cache pollution. However, with the L1-L2 cache speed gap growing, significant performance gains can be obtained if the data pref etcher can operate as aggressively as an L2-level pref etcher but with the fast hit times of an LI-level pref etcher. In this paper, we propose a prefetching framework where an LI-level prefetcher and an L2- level prefetcher work cooperatively to reduce the average access time more than either one alone can. We evaluate several design alternatives suited to perform synergistically under different workloads. From the insight we gather from this analysis, we propose a confidence-based adaptive prefetcher that can improve prefetch efficiency significantly with judicious use of available bus bandwidth. Our results show that for certain prefetcher combinations, two- level prefetching can achieve the cumulative speedup attained from either prefetcher alone. Furthermore, when compared to other two-level prefetching models, the adaptive design provides similar speedups with appreciably less bus traffic.\",\"PeriodicalId\":6306,\"journal\":{\"name\":\"2007 25th International Conference on Computer Design\",\"volume\":\"36 1\",\"pages\":\"238-244\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 25th International Conference on Computer Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2007.4601908\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 25th International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2007.4601908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

数据预取已被证明是一种有效的工具，可以隐藏现代处理器中与缓存丢失相关的部分延迟。传统上，数据预取器将数据提取到LI附近的一个小预取缓冲区中以获得低延迟，或者将数据提取到L2缓存中以获得更大的覆盖范围和更少的缓存污染。然而，随着L1-L2高速缓存速度差距的增大，如果数据预取器可以像l2级预取器一样积极地操作，但具有li级预取器的快速命中时间，则可以获得显著的性能提升。在本文中，我们提出了一个预取框架，其中li级预取器和L2级预取器协同工作，以减少平均访问时间。我们评估了几种适合在不同工作负载下协同执行的设计方案。根据我们从该分析中收集的见解，我们提出了一种基于置信度的自适应预取器，通过明智地使用可用总线带宽，可以显着提高预取效率。我们的结果表明，对于某些预取器组合，两级预取可以获得单独使用任一预取器所获得的累积加速。此外，与其他两级预取模型相比，自适应设计提供了相似的速度，但总线流量明显减少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Two-level ata prefetching

Data prefetching has been shown to be an effective tool in hiding part of the latency associated with cache misses in modern processors. Traditionally, data prefetchers fetch data into a small prefetch buffer near the LI for low latency, or the L2 cache for greater coverage and less cache pollution. However, with the L1-L2 cache speed gap growing, significant performance gains can be obtained if the data pref etcher can operate as aggressively as an L2-level pref etcher but with the fast hit times of an LI-level pref etcher. In this paper, we propose a prefetching framework where an LI-level prefetcher and an L2- level prefetcher work cooperatively to reduce the average access time more than either one alone can. We evaluate several design alternatives suited to perform synergistically under different workloads. From the insight we gather from this analysis, we propose a confidence-based adaptive prefetcher that can improve prefetch efficiency significantly with judicious use of available bus bandwidth. Our results show that for certain prefetcher combinations, two- level prefetching can achieve the cumulative speedup attained from either prefetcher alone. Furthermore, when compared to other two-level prefetching models, the adaptive design provides similar speedups with appreciably less bus traffic.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 25th International Conference on Computer Design

自引率

0.00%

发文量