异构CPU-PIM系统图处理负载均衡

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing Pub Date : 2025-04-28 DOI:10.1109/TETC.2025.3563249

Sheng Xu;Chun Li;Le Luo;Ming Zheng;Liang Yan;Xingqi Zou;Xiaoming Chen

{"title":"异构CPU-PIM系统图处理负载均衡","authors":"Sheng Xu;Chun Li;Le Luo;Ming Zheng;Liang Yan;Xingqi Zou;Xiaoming Chen","doi":"10.1109/TETC.2025.3563249","DOIUrl":null,"url":null,"abstract":"Processing-in-Memory (PIM) offers a promising architecture to alleviate the memory wall challenge in graph processing applications. The key aspect of PIM is to incorporate logic within the memory, thereby leveraging the near-data advantages. State-of-the-art PIM-based graph processing accelerators tend to offload more to the memory in order to maximize near-data benefits, causing significant load imbalance in PIM systems. In this paper, we demonstrate that this intention is not true and that host processors still play a vital role in heterogeneous CPU-PIM systems. For this purpose, we propose CAPLBS, an online contention-aware Processing-in-Memory load-balance scheduler for graph processing applications in CPU-PIM systems. The core concept of CAPLBS is to steal workload candidates back to host processors with minimal off-chip data synchronization overhead when some host processors are idle. To model data contentions among workloads and determine the stealing decision, a measurement structure called Locality Cohesive Subgraph is proposed by deeply exploring the connectivity of the input graph and the memory access patterns of deployed graph applications. Experimental results show that CAPLBS achieved an average speed-up of 4.8× and 1.3× (up to 9.1× and 1.9×) compared with CPU-only and the upper bound of locality-aware fine-grained in-memory atomics. Moreover, CAPLBS adds no hardware overhead and works well with existing CPU-PIM graph processing accelerators.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1068-1082"},"PeriodicalIF":5.4000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Balancing Graph Processing Workloads in Heterogeneous CPU-PIM Systems\",\"authors\":\"Sheng Xu;Chun Li;Le Luo;Ming Zheng;Liang Yan;Xingqi Zou;Xiaoming Chen\",\"doi\":\"10.1109/TETC.2025.3563249\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Processing-in-Memory (PIM) offers a promising architecture to alleviate the memory wall challenge in graph processing applications. The key aspect of PIM is to incorporate logic within the memory, thereby leveraging the near-data advantages. State-of-the-art PIM-based graph processing accelerators tend to offload more to the memory in order to maximize near-data benefits, causing significant load imbalance in PIM systems. In this paper, we demonstrate that this intention is not true and that host processors still play a vital role in heterogeneous CPU-PIM systems. For this purpose, we propose CAPLBS, an online contention-aware Processing-in-Memory load-balance scheduler for graph processing applications in CPU-PIM systems. The core concept of CAPLBS is to steal workload candidates back to host processors with minimal off-chip data synchronization overhead when some host processors are idle. To model data contentions among workloads and determine the stealing decision, a measurement structure called Locality Cohesive Subgraph is proposed by deeply exploring the connectivity of the input graph and the memory access patterns of deployed graph applications. Experimental results show that CAPLBS achieved an average speed-up of 4.8× and 1.3× (up to 9.1× and 1.9×) compared with CPU-only and the upper bound of locality-aware fine-grained in-memory atomics. Moreover, CAPLBS adds no hardware overhead and works well with existing CPU-PIM graph processing accelerators.\",\"PeriodicalId\":13156,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computing\",\"volume\":\"13 3\",\"pages\":\"1068-1082\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10979263/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10979263/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

内存中处理（PIM）提供了一种很有前途的架构来缓解图形处理应用程序中内存墙的挑战。PIM的关键方面是在内存中合并逻辑，从而利用近数据优势。最先进的基于PIM的图形处理加速器倾向于将更多的负载卸载到内存中，以最大化近数据收益，从而导致PIM系统中显著的负载不平衡。在本文中，我们证明了这种意图是不正确的，并且主机处理器仍然在异构CPU-PIM系统中起着至关重要的作用。为此，我们提出了CAPLBS，一个用于CPU-PIM系统中图形处理应用程序的在线竞争感知内存中处理负载平衡调度程序。CAPLBS的核心概念是，当某些主机处理器空闲时，以最小的片外数据同步开销将候选工作负载窃取回主机处理器。为了模拟工作负载间的数据争用并确定窃取决策，通过深入研究输入图的连通性和部署图应用程序的内存访问模式，提出了一种称为局部性内聚子图的测量结构。实验结果表明，与仅使用cpu和位置感知的细粒度内存原子相比，CAPLBS的平均速度提高了4.8倍和1.3倍（最高可达9.1倍和1.9倍）。此外，CAPLBS不增加硬件开销，与现有的CPU-PIM图形处理加速器配合良好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Balancing Graph Processing Workloads in Heterogeneous CPU-PIM Systems

Processing-in-Memory (PIM) offers a promising architecture to alleviate the memory wall challenge in graph processing applications. The key aspect of PIM is to incorporate logic within the memory, thereby leveraging the near-data advantages. State-of-the-art PIM-based graph processing accelerators tend to offload more to the memory in order to maximize near-data benefits, causing significant load imbalance in PIM systems. In this paper, we demonstrate that this intention is not true and that host processors still play a vital role in heterogeneous CPU-PIM systems. For this purpose, we propose CAPLBS, an online contention-aware Processing-in-Memory load-balance scheduler for graph processing applications in CPU-PIM systems. The core concept of CAPLBS is to steal workload candidates back to host processors with minimal off-chip data synchronization overhead when some host processors are idle. To model data contentions among workloads and determine the stealing decision, a measurement structure called Locality Cohesive Subgraph is proposed by deeply exploring the connectivity of the input graph and the memory access patterns of deployed graph applications. Experimental results show that CAPLBS achieved an average speed-up of 4.8× and 1.3× (up to 9.1× and 1.9×) compared with CPU-only and the upper bound of locality-aware fine-grained in-memory atomics. Moreover, CAPLBS adds no hardware overhead and works well with existing CPU-PIM graph processing accelerators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Emerging Topics in Computing Computer Science-Computer Science (miscellaneous)

CiteScore

12.10

自引率

5.10%

发文量

113

期刊介绍： IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.