{"title":"Bridge-NDP: Efficient Communication-Computation Overlap in Near Data Processing System","authors":"Liyan Chen;Pengyu Liu;Dongxu Lyu;Jianfei Jiang;Qin Wang;Zhigang Mao;Naifeng Jing","doi":"10.1109/TCAD.2025.3531254","DOIUrl":null,"url":null,"abstract":"Near data processing (NDP), enabled by near data accelerators (NDAs) within DIMM-based main memory, enhances performance by providing more aggregated bandwidth and reducing long-distance data transfers. While the performance of NDAs has received widespread attention, the overhead of host-NDA communication has been overlooked, becoming a bottleneck in NDP systems. To alleviate performance degradation from communication, we propose Bridge-NDP, the first NDP architecture that implements a workflow with efficient communication-computation overlap. Bridge-NDP is built upon the conventional NDP architecture and can be easily applied to existing NDP designs, regardless of the memory level where NDAs are attached. Specifically, we introduce a novel direct host-NDA communication method that utilizes existing memory buses as bridge buses, avoiding the need for new interconnections. It enables seamless integration with other memory accesses while achieving high bandwidth utilization with minimal hardware overhead. For the system-level workflow design, we optimize and extend existing dataflow to achieve richer computing paradigms with fewer redundant memory accesses. Additionally, we provide programming support with efficient API designs and data management to hide low-level resource details and ensure correctness guarantees. Comprehensive experiments demonstrate that Bridge-NDP achieves significant performance improvements, with speedups of <inline-formula> <tex-math>$1.8\\times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$3.1\\times $ </tex-math></inline-formula> and bandwidth utilization improvement of <inline-formula> <tex-math>$2.0\\times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$2.9\\times $ </tex-math></inline-formula> over the state-of-the-art NDP solutions.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"2939-2951"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10844857/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Near data processing (NDP), enabled by near data accelerators (NDAs) within DIMM-based main memory, enhances performance by providing more aggregated bandwidth and reducing long-distance data transfers. While the performance of NDAs has received widespread attention, the overhead of host-NDA communication has been overlooked, becoming a bottleneck in NDP systems. To alleviate performance degradation from communication, we propose Bridge-NDP, the first NDP architecture that implements a workflow with efficient communication-computation overlap. Bridge-NDP is built upon the conventional NDP architecture and can be easily applied to existing NDP designs, regardless of the memory level where NDAs are attached. Specifically, we introduce a novel direct host-NDA communication method that utilizes existing memory buses as bridge buses, avoiding the need for new interconnections. It enables seamless integration with other memory accesses while achieving high bandwidth utilization with minimal hardware overhead. For the system-level workflow design, we optimize and extend existing dataflow to achieve richer computing paradigms with fewer redundant memory accesses. Additionally, we provide programming support with efficient API designs and data management to hide low-level resource details and ensure correctness guarantees. Comprehensive experiments demonstrate that Bridge-NDP achieves significant performance improvements, with speedups of $1.8\times $ –$3.1\times $ and bandwidth utilization improvement of $2.0\times $ –$2.9\times $ over the state-of-the-art NDP solutions.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.