Active-Routing: Compute on the Way for Near-Data Processing

Jiayi Huang, Ramprakash Reddy Puli, Pritam Majumder, Sungkeun Kim, R. Boyapati, K. H. Yum, Eun Jung Kim
{"title":"Active-Routing: Compute on the Way for Near-Data Processing","authors":"Jiayi Huang, Ramprakash Reddy Puli, Pritam Majumder, Sungkeun Kim, R. Boyapati, K. H. Yum, Eun Jung Kim","doi":"10.1109/HPCA.2019.00018","DOIUrl":null,"url":null,"abstract":"—The explosion of data availability and the demand for faster data analysis have led to the emergence of applications exhibiting large memory footprint and low data reuse rate. These workloads, ranging from neural networks to graph processing, expose compute kernels that operate over myriads of data. Significant data movement requirements of these kernels impose heavy stress on modern memory subsystems and communication fabrics. To mitigate the worsening gap between high CPU computation density and deficient memory bandwidth, solutions like memory networks and near-data processing designs are being architected to improve system performance substantially. In this work, we examine the idea of mapping compute ker- nels to the memory network so as to leverage in-network computing in data-flow style, by means of near-data processing. We propose Active-Routing , an in-network compute architecture that enables computation on the way for near-data processing by exploiting patterns of aggregation over intermediate results of arithmetic operators. The proposed architecture leverages the massive memory-level parallelism and network concurrency to optimize the aggregation operations along a dynamically built Active-Routing Tree . Our evaluations show that Active-Routing can achieve upto 7 × speedup with an average of 60% performance improvement, and reduce the energy-delay product by 80% across various benchmarks compared to the state-of-the-art processing-in-memory architecture.","PeriodicalId":102050,"journal":{"name":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2019.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21

Abstract

—The explosion of data availability and the demand for faster data analysis have led to the emergence of applications exhibiting large memory footprint and low data reuse rate. These workloads, ranging from neural networks to graph processing, expose compute kernels that operate over myriads of data. Significant data movement requirements of these kernels impose heavy stress on modern memory subsystems and communication fabrics. To mitigate the worsening gap between high CPU computation density and deficient memory bandwidth, solutions like memory networks and near-data processing designs are being architected to improve system performance substantially. In this work, we examine the idea of mapping compute ker- nels to the memory network so as to leverage in-network computing in data-flow style, by means of near-data processing. We propose Active-Routing , an in-network compute architecture that enables computation on the way for near-data processing by exploiting patterns of aggregation over intermediate results of arithmetic operators. The proposed architecture leverages the massive memory-level parallelism and network concurrency to optimize the aggregation operations along a dynamically built Active-Routing Tree . Our evaluations show that Active-Routing can achieve upto 7 × speedup with an average of 60% performance improvement, and reduce the energy-delay product by 80% across various benchmarks compared to the state-of-the-art processing-in-memory architecture.
主动路由:近数据处理方式上的计算
数据可用性的爆炸式增长和对更快的数据分析的需求导致了大量内存占用和低数据重用率的应用程序的出现。这些工作负载,从神经网络到图形处理,暴露了在无数数据上操作的计算内核。这些内核的重要数据移动需求给现代存储子系统和通信结构带来了沉重的压力。为了缓解高CPU计算密度和内存带宽不足之间日益恶化的差距,人们正在设计内存网络和近数据处理设计等解决方案,以大幅提高系统性能。在这项工作中,我们研究了将计算内核映射到内存网络的想法,以便通过近数据处理来利用数据流风格的网络内计算。我们提出主动路由,这是一种网络内计算架构,通过利用算术运算符中间结果的聚合模式,使近数据处理的计算成为可能。所提出的体系结构利用大量内存级并行性和网络并发性,沿着动态构建的活动路由树优化聚合操作。我们的评估表明,与最先进的内存处理架构相比,Active-Routing可以实现高达7倍的加速,平均性能提高60%,并在各种基准测试中将能量延迟产品降低80%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信