{"title":"异构CPU-PIM系统图处理负载均衡","authors":"Sheng Xu;Chun Li;Le Luo;Ming Zheng;Liang Yan;Xingqi Zou;Xiaoming Chen","doi":"10.1109/TETC.2025.3563249","DOIUrl":null,"url":null,"abstract":"Processing-in-Memory (PIM) offers a promising architecture to alleviate the memory wall challenge in graph processing applications. The key aspect of PIM is to incorporate logic within the memory, thereby leveraging the near-data advantages. State-of-the-art PIM-based graph processing accelerators tend to offload more to the memory in order to maximize near-data benefits, causing significant load imbalance in PIM systems. In this paper, we demonstrate that this intention is not true and that host processors still play a vital role in heterogeneous CPU-PIM systems. For this purpose, we propose CAPLBS, an online contention-aware Processing-in-Memory load-balance scheduler for graph processing applications in CPU-PIM systems. The core concept of CAPLBS is to steal workload candidates back to host processors with minimal off-chip data synchronization overhead when some host processors are idle. To model data contentions among workloads and determine the stealing decision, a measurement structure called Locality Cohesive Subgraph is proposed by deeply exploring the connectivity of the input graph and the memory access patterns of deployed graph applications. Experimental results show that CAPLBS achieved an average speed-up of 4.8× and 1.3× (up to 9.1× and 1.9×) compared with CPU-only and the upper bound of locality-aware fine-grained in-memory atomics. Moreover, CAPLBS adds no hardware overhead and works well with existing CPU-PIM graph processing accelerators.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"1068-1082"},"PeriodicalIF":5.4000,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Balancing Graph Processing Workloads in Heterogeneous CPU-PIM Systems\",\"authors\":\"Sheng Xu;Chun Li;Le Luo;Ming Zheng;Liang Yan;Xingqi Zou;Xiaoming Chen\",\"doi\":\"10.1109/TETC.2025.3563249\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Processing-in-Memory (PIM) offers a promising architecture to alleviate the memory wall challenge in graph processing applications. The key aspect of PIM is to incorporate logic within the memory, thereby leveraging the near-data advantages. State-of-the-art PIM-based graph processing accelerators tend to offload more to the memory in order to maximize near-data benefits, causing significant load imbalance in PIM systems. In this paper, we demonstrate that this intention is not true and that host processors still play a vital role in heterogeneous CPU-PIM systems. For this purpose, we propose CAPLBS, an online contention-aware Processing-in-Memory load-balance scheduler for graph processing applications in CPU-PIM systems. The core concept of CAPLBS is to steal workload candidates back to host processors with minimal off-chip data synchronization overhead when some host processors are idle. To model data contentions among workloads and determine the stealing decision, a measurement structure called Locality Cohesive Subgraph is proposed by deeply exploring the connectivity of the input graph and the memory access patterns of deployed graph applications. Experimental results show that CAPLBS achieved an average speed-up of 4.8× and 1.3× (up to 9.1× and 1.9×) compared with CPU-only and the upper bound of locality-aware fine-grained in-memory atomics. Moreover, CAPLBS adds no hardware overhead and works well with existing CPU-PIM graph processing accelerators.\",\"PeriodicalId\":13156,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computing\",\"volume\":\"13 3\",\"pages\":\"1068-1082\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10979263/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10979263/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Balancing Graph Processing Workloads in Heterogeneous CPU-PIM Systems
Processing-in-Memory (PIM) offers a promising architecture to alleviate the memory wall challenge in graph processing applications. The key aspect of PIM is to incorporate logic within the memory, thereby leveraging the near-data advantages. State-of-the-art PIM-based graph processing accelerators tend to offload more to the memory in order to maximize near-data benefits, causing significant load imbalance in PIM systems. In this paper, we demonstrate that this intention is not true and that host processors still play a vital role in heterogeneous CPU-PIM systems. For this purpose, we propose CAPLBS, an online contention-aware Processing-in-Memory load-balance scheduler for graph processing applications in CPU-PIM systems. The core concept of CAPLBS is to steal workload candidates back to host processors with minimal off-chip data synchronization overhead when some host processors are idle. To model data contentions among workloads and determine the stealing decision, a measurement structure called Locality Cohesive Subgraph is proposed by deeply exploring the connectivity of the input graph and the memory access patterns of deployed graph applications. Experimental results show that CAPLBS achieved an average speed-up of 4.8× and 1.3× (up to 9.1× and 1.9×) compared with CPU-only and the upper bound of locality-aware fine-grained in-memory atomics. Moreover, CAPLBS adds no hardware overhead and works well with existing CPU-PIM graph processing accelerators.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.