GraphDEAR: An Accelerator Architecture for Exploiting Cache Locality in Graph Analytics Applications

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP) Pub Date : 2022-03-01 DOI:10.1109/pdp55904.2022.00029

Siyi Hu, Masaaki Kondo, Yuan He, Ryuichi Sakamoto, Haotong Zhang, Jun Zhou, Hiroshi Nakamura

{"title":"GraphDEAR: An Accelerator Architecture for Exploiting Cache Locality in Graph Analytics Applications","authors":"Siyi Hu, Masaaki Kondo, Yuan He, Ryuichi Sakamoto, Haotong Zhang, Jun Zhou, Hiroshi Nakamura","doi":"10.1109/pdp55904.2022.00029","DOIUrl":null,"url":null,"abstract":"Data structure is the key in Edge Computing where various types of data are continuously generated by ubiquitous devices. Within all common data structures, graphs are used to express relationships and dependencies among human identities, objects, and locations; and they are expected to become one of the most important data infrastructure in the near future. Furthermore, as graph processing often requires random accesses to vast memory spaces, conventional memory hierarchies with caches cannot perform efficiently. To alleviate such memory access bottlenecks in graph processing, we present a solution through vertex accesses scheduling and edge array re-ordering, in parallel with the execution of graph processing application to improve both temporal and spatial locality of memory accesses, especially for edge-centric graphs which are popular means in handling dynamic graphs. Our proposed architecture is evaluated and tested through both trace-based cache simulations and cycle-accurate FPGA-based prototyping. Evaluation results show that our proposal has a potential of significantly reducing the quantity of Miss-Per-Kilo-Instructions (MPKI) for Last Level Cache (LLC) by 56.27% on average.","PeriodicalId":210759,"journal":{"name":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/pdp55904.2022.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Data structure is the key in Edge Computing where various types of data are continuously generated by ubiquitous devices. Within all common data structures, graphs are used to express relationships and dependencies among human identities, objects, and locations; and they are expected to become one of the most important data infrastructure in the near future. Furthermore, as graph processing often requires random accesses to vast memory spaces, conventional memory hierarchies with caches cannot perform efficiently. To alleviate such memory access bottlenecks in graph processing, we present a solution through vertex accesses scheduling and edge array re-ordering, in parallel with the execution of graph processing application to improve both temporal and spatial locality of memory accesses, especially for edge-centric graphs which are popular means in handling dynamic graphs. Our proposed architecture is evaluated and tested through both trace-based cache simulations and cycle-accurate FPGA-based prototyping. Evaluation results show that our proposal has a potential of significantly reducing the quantity of Miss-Per-Kilo-Instructions (MPKI) for Last Level Cache (LLC) by 56.27% on average.

查看原文本刊更多论文

GraphDEAR:在图形分析应用程序中利用缓存局域性的加速架构

数据结构是边缘计算的关键，无处不在的设备不断产生各种类型的数据。在所有常见的数据结构中，图形用于表达人的身份、对象和位置之间的关系和依赖关系;在不久的将来，它们有望成为最重要的数据基础设施之一。此外，由于图形处理通常需要随机访问大量内存空间，传统的带有缓存的内存层次结构无法有效执行。为了缓解图处理中的内存访问瓶颈，我们提出了一个解决方案，通过顶点访问调度和边缘数组重新排序，并行执行图处理应用程序，以提高内存访问的时间和空间局部性，特别是对于边缘中心图，这是处理动态图的常用方法。我们提出的架构通过基于跟踪的缓存模拟和基于周期精确的fpga原型进行评估和测试。评估结果表明，我们的方案有可能显著降低最后一级缓存(LLC)的每千指令缺失量(MPKI)，平均降低56.27%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 30th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)

自引率

0.00%

发文量