Active-Routing: Compute on the Way for Near-Data Processing

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2019-02-01 DOI:10.1109/HPCA.2019.00018

Jiayi Huang, Ramprakash Reddy Puli, Pritam Majumder, Sungkeun Kim, R. Boyapati, K. H. Yum, Eun Jung Kim

{"title":"Active-Routing: Compute on the Way for Near-Data Processing","authors":"Jiayi Huang, Ramprakash Reddy Puli, Pritam Majumder, Sungkeun Kim, R. Boyapati, K. H. Yum, Eun Jung Kim","doi":"10.1109/HPCA.2019.00018","DOIUrl":null,"url":null,"abstract":"—The explosion of data availability and the demand for faster data analysis have led to the emergence of applications exhibiting large memory footprint and low data reuse rate. These workloads, ranging from neural networks to graph processing, expose compute kernels that operate over myriads of data. Signiﬁcant data movement requirements of these kernels impose heavy stress on modern memory subsystems and communication fabrics. To mitigate the worsening gap between high CPU computation density and deﬁcient memory bandwidth, solutions like memory networks and near-data processing designs are being architected to improve system performance substantially. In this work, we examine the idea of mapping compute ker- nels to the memory network so as to leverage in-network computing in data-ﬂow style, by means of near-data processing. We propose Active-Routing , an in-network compute architecture that enables computation on the way for near-data processing by exploiting patterns of aggregation over intermediate results of arithmetic operators. The proposed architecture leverages the massive memory-level parallelism and network concurrency to optimize the aggregation operations along a dynamically built Active-Routing Tree . Our evaluations show that Active-Routing can achieve upto 7 × speedup with an average of 60% performance improvement, and reduce the energy-delay product by 80% across various benchmarks compared to the state-of-the-art processing-in-memory architecture.","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

—The explosion of data availability and the demand for faster data analysis have led to the emergence of applications exhibiting large memory footprint and low data reuse rate. These workloads, ranging from neural networks to graph processing, expose compute kernels that operate over myriads of data. Signiﬁcant data movement requirements of these kernels impose heavy stress on modern memory subsystems and communication fabrics. To mitigate the worsening gap between high CPU computation density and deﬁcient memory bandwidth, solutions like memory networks and near-data processing designs are being architected to improve system performance substantially. In this work, we examine the idea of mapping compute ker- nels to the memory network so as to leverage in-network computing in data-ﬂow style, by means of near-data processing. We propose Active-Routing , an in-network compute architecture that enables computation on the way for near-data processing by exploiting patterns of aggregation over intermediate results of arithmetic operators. The proposed architecture leverages the massive memory-level parallelism and network concurrency to optimize the aggregation operations along a dynamically built Active-Routing Tree . Our evaluations show that Active-Routing can achieve upto 7 × speedup with an average of 60% performance improvement, and reduce the energy-delay product by 80% across various benchmarks compared to the state-of-the-art processing-in-memory architecture.

查看原文本刊更多论文

主动路由:近数据处理方式上的计算

数据可用性的爆炸式增长和对更快的数据分析的需求导致了大量内存占用和低数据重用率的应用程序的出现。这些工作负载，从神经网络到图形处理，暴露了在无数数据上操作的计算内核。这些内核的重要数据移动需求给现代存储子系统和通信结构带来了沉重的压力。为了缓解高CPU计算密度和内存带宽不足之间日益恶化的差距，人们正在设计内存网络和近数据处理设计等解决方案，以大幅提高系统性能。在这项工作中，我们研究了将计算内核映射到内存网络的想法，以便通过近数据处理来利用数据流风格的网络内计算。我们提出主动路由，这是一种网络内计算架构，通过利用算术运算符中间结果的聚合模式，使近数据处理的计算成为可能。所提出的体系结构利用大量内存级并行性和网络并发性，沿着动态构建的活动路由树优化聚合操作。我们的评估表明，与最先进的内存处理架构相比，Active-Routing可以实现高达7倍的加速，平均性能提高60%，并在各种基准测试中将能量延迟产品降低80%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量