以内存为中心的计算系统的基准测试:内存硬件的实际处理分析

2021 12th International Green and Sustainable Computing Conference (IGSC) Pub Date : 2021-10-04 DOI:10.1109/IGSC54211.2021.9651614

Juan G'omez-Luna, I. E. Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, O. Mutlu

{"title":"以内存为中心的计算系统的基准测试:内存硬件的实际处理分析","authors":"Juan G'omez-Luna, I. E. Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, O. Mutlu","doi":"10.1109/IGSC54211.2021.9651614","DOIUrl":null,"url":null,"abstract":"Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize the cost of memory access. Fundamentally addressing this data movement bottleneck requires a paradigm where the memory system assumes an active role in computing by integrating processing capabilities. This paradigm is known as processing-in-memory (PIM). Recent research explores different forms of PIM architectures, motivated by the emergence of new technologies that integrate memory with a logic layer, where processing elements can be easily placed. Past works evaluate these architectures in simulation or, at best, with simplified hardware prototypes. In contrast, the UPMEM company has designed and manufactured the first publicly-available real-world PIM architecture. The UPMEM PIM architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip. This paper presents key takeaways from the first comprehensive analysis [1] of the first publicly-available real-world PIM architecture. First, we introduce our experimental characterization of the UPMEM PIM architecture using microbenchmarks, and present PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains (e.g., dense/sparse linear algebra, databases, data analytics, graph processing, neural networks, bioinformatics, image processing), which we identify as memory-bound. Second, we provide four key takeaways about the UPMEM PIM architecture, which stem from our study of the performance and scaling characteristics of PrIM benchmarks on the UPMEM PIM architecture, and their performance and energy consumption comparison to their state-of-the-art CPU and GPU counterparts. More insights about suitability of different workloads to the PIM system, programming recommendations for software designers, and suggestions and hints for hardware and architecture designers of future PIM systems are available in [1].","PeriodicalId":334989,"journal":{"name":"2021 12th International Green and Sustainable Computing Conference (IGSC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":"{\"title\":\"Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-In-Memory Hardware\",\"authors\":\"Juan G'omez-Luna, I. E. Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, O. Mutlu\",\"doi\":\"10.1109/IGSC54211.2021.9651614\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize the cost of memory access. Fundamentally addressing this data movement bottleneck requires a paradigm where the memory system assumes an active role in computing by integrating processing capabilities. This paradigm is known as processing-in-memory (PIM). Recent research explores different forms of PIM architectures, motivated by the emergence of new technologies that integrate memory with a logic layer, where processing elements can be easily placed. Past works evaluate these architectures in simulation or, at best, with simplified hardware prototypes. In contrast, the UPMEM company has designed and manufactured the first publicly-available real-world PIM architecture. The UPMEM PIM architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip. This paper presents key takeaways from the first comprehensive analysis [1] of the first publicly-available real-world PIM architecture. First, we introduce our experimental characterization of the UPMEM PIM architecture using microbenchmarks, and present PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains (e.g., dense/sparse linear algebra, databases, data analytics, graph processing, neural networks, bioinformatics, image processing), which we identify as memory-bound. Second, we provide four key takeaways about the UPMEM PIM architecture, which stem from our study of the performance and scaling characteristics of PrIM benchmarks on the UPMEM PIM architecture, and their performance and energy consumption comparison to their state-of-the-art CPU and GPU counterparts. More insights about suitability of different workloads to the PIM system, programming recommendations for software designers, and suggestions and hints for hardware and architecture designers of future PIM systems are available in [1].\",\"PeriodicalId\":334989,\"journal\":{\"name\":\"2021 12th International Green and Sustainable Computing Conference (IGSC)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"37\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 12th International Green and Sustainable Computing Conference (IGSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IGSC54211.2021.9651614\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Green and Sustainable Computing Conference (IGSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IGSC54211.2021.9651614","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

摘要

许多现代工作负载，如神经网络推理和图形处理，基本上都是内存受限的。对于这样的工作负载，在内存和CPU内核之间的数据移动在延迟和能量方面都会带来很大的开销。一个主要原因是这种通信是通过具有高延迟和有限带宽的窄总线进行的，并且内存受限工作负载中的低数据重用不足以分摊内存访问的成本。从根本上解决这个数据移动瓶颈需要一种范式，其中内存系统通过集成处理能力在计算中承担积极的角色。这种范式被称为内存处理(PIM)。最近的研究探索了不同形式的PIM体系结构，其动机是将存储器与逻辑层集成在一起的新技术的出现，在逻辑层中可以很容易地放置处理元素。过去的工作在模拟中评估这些架构，或者充其量是使用简化的硬件原型。相比之下，UPMEM公司已经设计并制造了第一个公开可用的真实世界的PIM体系结构。UPMEM PIM架构将传统的DRAM存储阵列与称为DRAM处理单元(dpu)的通用顺序内核集成在同一芯片中。本文介绍了对第一个公开可用的实际PIM体系结构的第一次全面分析[1]中的关键要点。首先，我们使用微基准测试介绍了我们对UPMEM PIM架构的实验表征，并提出了PrIM(内存中处理基准测试)，这是一个来自不同应用领域(例如，密集/稀疏线性代数、数据库、数据分析、图形处理、神经网络、生物信息学、图像处理)的16个工作负载的基准测试套件，我们将其确定为内存约束。其次，我们提供了关于UPMEM PIM架构的四个关键要点，这些要点源于我们对UPMEM PIM架构上PrIM基准测试的性能和扩展特征的研究，以及它们与最先进的CPU和GPU对应的性能和能耗的比较。关于不同工作负载对PIM系统的适用性的更多见解，对软件设计人员的编程建议，以及对未来PIM系统的硬件和架构设计人员的建议和提示，请参见[1]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-In-Memory Hardware

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bound workloads is insufficient to amortize the cost of memory access. Fundamentally addressing this data movement bottleneck requires a paradigm where the memory system assumes an active role in computing by integrating processing capabilities. This paradigm is known as processing-in-memory (PIM). Recent research explores different forms of PIM architectures, motivated by the emergence of new technologies that integrate memory with a logic layer, where processing elements can be easily placed. Past works evaluate these architectures in simulation or, at best, with simplified hardware prototypes. In contrast, the UPMEM company has designed and manufactured the first publicly-available real-world PIM architecture. The UPMEM PIM architecture combines traditional DRAM memory arrays with general-purpose in-order cores, called DRAM Processing Units (DPUs), integrated in the same chip. This paper presents key takeaways from the first comprehensive analysis [1] of the first publicly-available real-world PIM architecture. First, we introduce our experimental characterization of the UPMEM PIM architecture using microbenchmarks, and present PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domains (e.g., dense/sparse linear algebra, databases, data analytics, graph processing, neural networks, bioinformatics, image processing), which we identify as memory-bound. Second, we provide four key takeaways about the UPMEM PIM architecture, which stem from our study of the performance and scaling characteristics of PrIM benchmarks on the UPMEM PIM architecture, and their performance and energy consumption comparison to their state-of-the-art CPU and GPU counterparts. More insights about suitability of different workloads to the PIM system, programming recommendations for software designers, and suggestions and hints for hardware and architecture designers of future PIM systems are available in [1].

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 12th International Green and Sustainable Computing Conference (IGSC)

自引率

0.00%

发文量