Graph Prefetching Using Data Structure Knowledge

Proceedings of the 2016 International Conference on Supercomputing Pub Date : 2016-06-01 DOI:10.1145/2925426.2926254

S. Ainsworth, Timothy M. Jones

{"title":"Graph Prefetching Using Data Structure Knowledge","authors":"S. Ainsworth, Timothy M. Jones","doi":"10.1145/2925426.2926254","DOIUrl":null,"url":null,"abstract":"Searches on large graphs are heavily memory latency bound, as a result of many high latency DRAM accesses. Due to the highly irregular nature of the access patterns involved, caches and prefetchers, both hardware and software, perform poorly on graph workloads. This leads to CPU stalling for the majority of the time. However, in many cases the data access pattern is well defined and predictable in advance, many falling into a small set of simple patterns. Although existing implicit prefetchers cannot bring significant benefit, a prefetcher armed with knowledge of the data structures and access patterns could accurately anticipate applications' traversals to bring in the appropriate data. This paper presents a design of an explicitly configured prefetcher to improve performance for breadth-first searches and sequential iteration on the efficient and commonly-used compressed sparse row graph format. By snooping L1 cache accesses from the core and reacting to data returned from its own prefetches, the prefetcher can schedule timely loads of data in advance of the application needing it. For a range of applications and graph sizes, our prefetcher achieves average speedups of 2.3x, and up to 3.3x, with little impact on memory bandwidth requirements.","PeriodicalId":422112,"journal":{"name":"Proceedings of the 2016 International Conference on Supercomputing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2925426.2926254","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 57

Abstract

Searches on large graphs are heavily memory latency bound, as a result of many high latency DRAM accesses. Due to the highly irregular nature of the access patterns involved, caches and prefetchers, both hardware and software, perform poorly on graph workloads. This leads to CPU stalling for the majority of the time. However, in many cases the data access pattern is well defined and predictable in advance, many falling into a small set of simple patterns. Although existing implicit prefetchers cannot bring significant benefit, a prefetcher armed with knowledge of the data structures and access patterns could accurately anticipate applications' traversals to bring in the appropriate data. This paper presents a design of an explicitly configured prefetcher to improve performance for breadth-first searches and sequential iteration on the efficient and commonly-used compressed sparse row graph format. By snooping L1 cache accesses from the core and reacting to data returned from its own prefetches, the prefetcher can schedule timely loads of data in advance of the application needing it. For a range of applications and graph sizes, our prefetcher achieves average speedups of 2.3x, and up to 3.3x, with little impact on memory bandwidth requirements.

查看原文本刊更多论文

使用数据结构知识的图预取

由于许多高延迟的DRAM访问，对大型图的搜索受到内存延迟的严重限制。由于所涉及的访问模式的高度不规则性，缓存和预取器(硬件和软件)在图形工作负载上的性能很差。这将导致CPU在大部分时间内停滞不前。但是，在许多情况下，数据访问模式是预先定义好的和可预测的，其中许多模式属于一小组简单模式。尽管现有的隐式预取器不能带来显著的好处，但是掌握了数据结构和访问模式知识的预取器可以准确地预测应用程序的遍历，从而引入适当的数据。本文提出了一种显式配置的预取器设计，以提高在高效且常用的压缩稀疏行图格式上的宽度优先搜索和顺序迭代性能。通过窥探来自核心的L1缓存访问，并对自己的预取返回的数据做出反应，预取器可以在应用程序需要数据之前调度数据的及时加载。对于一系列应用程序和图形大小，我们的预取器实现了2.3倍的平均速度，最高可达3.3倍，对内存带宽需求的影响很小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2016 International Conference on Supercomputing

自引率

0.00%

发文量