PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture

Junwhan Ahn, S. Yoo, O. Mutlu, Kiyoung Choi
{"title":"PIM-enabled instructions: A low-overhead, locality-aware processing-in-memory architecture","authors":"Junwhan Ahn, S. Yoo, O. Mutlu, Kiyoung Choi","doi":"10.1145/2749469.2750385","DOIUrl":null,"url":null,"abstract":"Processing-in-memory (PIM) is rapidly rising as a viable solution for the memory wall crisis, rebounding from its unsuccessful attempts in 1990s due to practicality concerns, which are alleviated with recent advances in 3D stacking technologies. However, it is still challenging to integrate the PIM architectures with existing systems in a seamless manner due to two common characteristics: unconventional programming models for in-memory computation units and lack of ability to utilize large on-chip caches. In this paper, we propose a new PIM architecture that (I) does not change the existing sequential programming models and (2) automatically decides whether to execute PIM operations in memory or processors depending on the locality of data. The key idea is to implement simple in-memory computation using compute-capable memory commands and use specialized instructions, which we call PIM-enabled instructions, to invoke in-memory computation. This allows PIM operations to be interoperable with existing programming models, cache coherence protocols, and virtual memory mechanisms with no modification. In addition, we introduce a simple hardware structure that monitors the locality of data accessed by a PIM-enabled instruction at runtime to adaptively execute the instruction at the host processor (instead of in memory) when the instruction can benefit from large on-chip caches. Consequently, our architecture provides the illusion that PIM operations are executed as if they were host processor instructions. We provide a case study of how ten emerging data-intensive workloads can benefit from our new PIM abstraction and its hardware implementation. Evaluations show that our architecture significantly improves system performance and, more importantly, combines the best parts of conventional and PlM architectures by adapting to data locality of applications.","PeriodicalId":6878,"journal":{"name":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","volume":"14 1","pages":"336-348"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"443","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2749469.2750385","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 443

Abstract

Processing-in-memory (PIM) is rapidly rising as a viable solution for the memory wall crisis, rebounding from its unsuccessful attempts in 1990s due to practicality concerns, which are alleviated with recent advances in 3D stacking technologies. However, it is still challenging to integrate the PIM architectures with existing systems in a seamless manner due to two common characteristics: unconventional programming models for in-memory computation units and lack of ability to utilize large on-chip caches. In this paper, we propose a new PIM architecture that (I) does not change the existing sequential programming models and (2) automatically decides whether to execute PIM operations in memory or processors depending on the locality of data. The key idea is to implement simple in-memory computation using compute-capable memory commands and use specialized instructions, which we call PIM-enabled instructions, to invoke in-memory computation. This allows PIM operations to be interoperable with existing programming models, cache coherence protocols, and virtual memory mechanisms with no modification. In addition, we introduce a simple hardware structure that monitors the locality of data accessed by a PIM-enabled instruction at runtime to adaptively execute the instruction at the host processor (instead of in memory) when the instruction can benefit from large on-chip caches. Consequently, our architecture provides the illusion that PIM operations are executed as if they were host processor instructions. We provide a case study of how ten emerging data-intensive workloads can benefit from our new PIM abstraction and its hardware implementation. Evaluations show that our architecture significantly improves system performance and, more importantly, combines the best parts of conventional and PlM architectures by adapting to data locality of applications.
支持pim的指令:低开销、位置感知的内存处理体系结构
内存中处理(PIM)作为解决内存墙危机的可行方案迅速崛起,从20世纪90年代由于实用性问题而失败的尝试中反弹,最近3D堆叠技术的进步缓解了这种危机。然而,由于两个共同的特点,将PIM体系结构与现有系统无缝集成仍然具有挑战性:内存计算单元的非常规编程模型以及缺乏利用大型片上缓存的能力。在本文中,我们提出了一种新的PIM架构,它(I)不改变现有的顺序编程模型,(2)根据数据的位置自动决定是在内存中执行PIM操作还是在处理器中执行PIM操作。关键思想是使用可计算的内存命令实现简单的内存计算,并使用专用指令(我们称之为支持pim的指令)来调用内存计算。这使得PIM操作无需修改即可与现有编程模型、缓存一致性协议和虚拟内存机制进行互操作。此外,我们还介绍了一个简单的硬件结构,该结构在运行时监视支持pim的指令访问的数据的位置,以便在指令可以从大型片上缓存中获益时,在主机处理器(而不是内存)上自适应地执行该指令。因此,我们的体系结构提供了PIM操作被执行的假象,就好像它们是主机处理器指令一样。我们提供了一个案例研究,说明十个新兴的数据密集型工作负载如何从我们的新PIM抽象及其硬件实现中受益。评估表明,我们的体系结构显著提高了系统性能,更重要的是,通过适应应用程序的数据位置,结合了传统和PlM体系结构的最佳部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信