GraphPIM:在图计算框架中启用指令级PIM卸载

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2017-02-01 DOI:10.1109/HPCA.2017.54

Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, Hyesoon Kim

{"title":"GraphPIM:在图计算框架中启用指令级PIM卸载","authors":"Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, Hyesoon Kim","doi":"10.1109/HPCA.2017.54","DOIUrl":null,"url":null,"abstract":"With the emergence of data science, graph computing has become increasingly important these days. Unfortunately, graph computing typically suffers from poor performance when mapped to modern computing systems because of the overhead of executing atomic operations and inefficient utilization of the memory subsystem. Meanwhile, emerging technologies, such as Hybrid Memory Cube (HMC), enable the processing-in-memory (PIM) functionality with offloading operations at an instruction level. Instruction offloading to the PIM side has considerable potentials to overcome the performance bottleneck of graph computing. Nevertheless, this functionality for graph workloads has not been fully explored, and its applications and shortcomings have not been well identified thus far. In this paper, we present GraphPIM, a full-stack solution for graph computing that achieves higher performance using PIM functionality. We perform an analysis on modern graph workloads to assess the applicability of PIM offloading and present hardware and software mechanisms to efficiently make use of the PIM functionality. Following the real-world HMC 2.0 specification, GraphPIM provides performance benefits for graph applications without any user code modification or ISA changes. In addition, we propose an extension to PIM operations that can further bring performance benefits for more graph applications. The evaluation results show that GraphPIM achieves up to a 2.4X speedup with a 37% reduction in energy consumption.","PeriodicalId":118950,"journal":{"name":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"230","resultStr":"{\"title\":\"GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks\",\"authors\":\"Lifeng Nai, Ramyad Hadidi, Jaewoong Sim, Hyojong Kim, Pranith Kumar, Hyesoon Kim\",\"doi\":\"10.1109/HPCA.2017.54\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the emergence of data science, graph computing has become increasingly important these days. Unfortunately, graph computing typically suffers from poor performance when mapped to modern computing systems because of the overhead of executing atomic operations and inefficient utilization of the memory subsystem. Meanwhile, emerging technologies, such as Hybrid Memory Cube (HMC), enable the processing-in-memory (PIM) functionality with offloading operations at an instruction level. Instruction offloading to the PIM side has considerable potentials to overcome the performance bottleneck of graph computing. Nevertheless, this functionality for graph workloads has not been fully explored, and its applications and shortcomings have not been well identified thus far. In this paper, we present GraphPIM, a full-stack solution for graph computing that achieves higher performance using PIM functionality. We perform an analysis on modern graph workloads to assess the applicability of PIM offloading and present hardware and software mechanisms to efficiently make use of the PIM functionality. Following the real-world HMC 2.0 specification, GraphPIM provides performance benefits for graph applications without any user code modification or ISA changes. In addition, we propose an extension to PIM operations that can further bring performance benefits for more graph applications. The evaluation results show that GraphPIM achieves up to a 2.4X speedup with a 37% reduction in energy consumption.\",\"PeriodicalId\":118950,\"journal\":{\"name\":\"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"230\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2017.54\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2017.54","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 230

摘要

随着数据科学的出现，图计算变得越来越重要。不幸的是，当映射到现代计算系统时，由于执行原子操作的开销和对内存子系统的低效利用，图计算的性能通常很差。同时，混合内存立方体(HMC)等新兴技术通过在指令级卸载操作实现了内存处理(PIM)功能。指令卸载到PIM端有很大的潜力来克服图计算的性能瓶颈。然而，图形工作负载的这种功能还没有得到充分的探索，到目前为止，它的应用和缺点还没有得到很好的识别。在本文中，我们提出了GraphPIM，一个使用PIM功能实现更高性能的图计算全栈解决方案。我们对现代图形工作负载进行了分析，以评估PIM卸载的适用性，并介绍了有效利用PIM功能的硬件和软件机制。遵循现实世界的HMC 2.0规范，GraphPIM为图形应用程序提供了性能优势，而无需修改任何用户代码或更改ISA。此外，我们提出了PIM操作的扩展，可以进一步为更多的图形应用程序带来性能优势。评估结果表明，GraphPIM实现了高达2.4倍的加速，能耗降低了37%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GraphPIM: Enabling Instruction-Level PIM Offloading in Graph Computing Frameworks

With the emergence of data science, graph computing has become increasingly important these days. Unfortunately, graph computing typically suffers from poor performance when mapped to modern computing systems because of the overhead of executing atomic operations and inefficient utilization of the memory subsystem. Meanwhile, emerging technologies, such as Hybrid Memory Cube (HMC), enable the processing-in-memory (PIM) functionality with offloading operations at an instruction level. Instruction offloading to the PIM side has considerable potentials to overcome the performance bottleneck of graph computing. Nevertheless, this functionality for graph workloads has not been fully explored, and its applications and shortcomings have not been well identified thus far. In this paper, we present GraphPIM, a full-stack solution for graph computing that achieves higher performance using PIM functionality. We perform an analysis on modern graph workloads to assess the applicability of PIM offloading and present hardware and software mechanisms to efficiently make use of the PIM functionality. Following the real-world HMC 2.0 specification, GraphPIM provides performance benefits for graph applications without any user code modification or ISA changes. In addition, we propose an extension to PIM operations that can further bring performance benefits for more graph applications. The evaluation results show that GraphPIM achieves up to a 2.4X speedup with a 37% reduction in energy consumption.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量