Bridge-NDP: Efficient Communication-Computation Overlap in Near Data Processing System

IF 2.9 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Liyan Chen;Pengyu Liu;Dongxu Lyu;Jianfei Jiang;Qin Wang;Zhigang Mao;Naifeng Jing
{"title":"Bridge-NDP: Efficient Communication-Computation Overlap in Near Data Processing System","authors":"Liyan Chen;Pengyu Liu;Dongxu Lyu;Jianfei Jiang;Qin Wang;Zhigang Mao;Naifeng Jing","doi":"10.1109/TCAD.2025.3531254","DOIUrl":null,"url":null,"abstract":"Near data processing (NDP), enabled by near data accelerators (NDAs) within DIMM-based main memory, enhances performance by providing more aggregated bandwidth and reducing long-distance data transfers. While the performance of NDAs has received widespread attention, the overhead of host-NDA communication has been overlooked, becoming a bottleneck in NDP systems. To alleviate performance degradation from communication, we propose Bridge-NDP, the first NDP architecture that implements a workflow with efficient communication-computation overlap. Bridge-NDP is built upon the conventional NDP architecture and can be easily applied to existing NDP designs, regardless of the memory level where NDAs are attached. Specifically, we introduce a novel direct host-NDA communication method that utilizes existing memory buses as bridge buses, avoiding the need for new interconnections. It enables seamless integration with other memory accesses while achieving high bandwidth utilization with minimal hardware overhead. For the system-level workflow design, we optimize and extend existing dataflow to achieve richer computing paradigms with fewer redundant memory accesses. Additionally, we provide programming support with efficient API designs and data management to hide low-level resource details and ensure correctness guarantees. Comprehensive experiments demonstrate that Bridge-NDP achieves significant performance improvements, with speedups of <inline-formula> <tex-math>$1.8\\times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$3.1\\times $ </tex-math></inline-formula> and bandwidth utilization improvement of <inline-formula> <tex-math>$2.0\\times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$2.9\\times $ </tex-math></inline-formula> over the state-of-the-art NDP solutions.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"2939-2951"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10844857/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Near data processing (NDP), enabled by near data accelerators (NDAs) within DIMM-based main memory, enhances performance by providing more aggregated bandwidth and reducing long-distance data transfers. While the performance of NDAs has received widespread attention, the overhead of host-NDA communication has been overlooked, becoming a bottleneck in NDP systems. To alleviate performance degradation from communication, we propose Bridge-NDP, the first NDP architecture that implements a workflow with efficient communication-computation overlap. Bridge-NDP is built upon the conventional NDP architecture and can be easily applied to existing NDP designs, regardless of the memory level where NDAs are attached. Specifically, we introduce a novel direct host-NDA communication method that utilizes existing memory buses as bridge buses, avoiding the need for new interconnections. It enables seamless integration with other memory accesses while achieving high bandwidth utilization with minimal hardware overhead. For the system-level workflow design, we optimize and extend existing dataflow to achieve richer computing paradigms with fewer redundant memory accesses. Additionally, we provide programming support with efficient API designs and data management to hide low-level resource details and ensure correctness guarantees. Comprehensive experiments demonstrate that Bridge-NDP achieves significant performance improvements, with speedups of $1.8\times $ $3.1\times $ and bandwidth utilization improvement of $2.0\times $ $2.9\times $ over the state-of-the-art NDP solutions.
桥- ndp:近距离数据处理系统中高效的通信-计算重叠
近数据处理(NDP)由基于dimm的主内存中的近数据加速器(nda)支持,通过提供更多聚合带宽和减少长距离数据传输来提高性能。在NDP性能受到广泛关注的同时,主机- nda通信的开销被忽视,成为NDP系统的瓶颈。为了减轻通信带来的性能下降,我们提出了Bridge-NDP,这是第一个实现具有有效通信-计算重叠的工作流的NDP架构。Bridge-NDP建立在传统的NDP架构之上,可以很容易地应用于现有的NDP设计,而不考虑nda所附加的内存级别。具体来说,我们引入了一种新的直接主机- nda通信方法,该方法利用现有的存储总线作为桥接总线,避免了对新的互连的需要。它支持与其他内存访问的无缝集成,同时以最小的硬件开销实现高带宽利用率。在系统级工作流设计中,对现有数据流进行优化和扩展,以减少冗余内存访问,实现更丰富的计算范式。此外,我们还通过高效的API设计和数据管理提供编程支持,以隐藏底层资源细节并确保正确性。综合实验表明,Bridge-NDP实现了显着的性能改进,与最先进的NDP解决方案相比,速度提高了1.8倍至3.1倍,带宽利用率提高了2.0倍至2.9倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
5.60
自引率
13.80%
发文量
500
审稿时长
7 months
期刊介绍: The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信