Energy effective 3D stacked hybrid NEMFET-CMOS caches

2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) Pub Date : 2014-07-08 DOI:10.1145/2770287.2770324

M. Lefter, M. Enachescu, G. Voicu, S. Cotofana

{"title":"Energy effective 3D stacked hybrid NEMFET-CMOS caches","authors":"M. Lefter, M. Enachescu, G. Voicu, S. Cotofana","doi":"10.1145/2770287.2770324","DOIUrl":null,"url":null,"abstract":"In this paper we propose to utilise 3D-stacked hybrid memories as alternative to traditional CMOS SRAMs in L1 and L2 cache implementations and analyse the potential implications of this approach on the processor performance, measured in terms of Instructions-per-Cycle (IPC) and energy consumption. The 3D hybrid memory cell relies on: (i) a Short Circuit Current Free Nano-Electro-Mechanical Field Effect Transistor (SCCF NEMFET) based inverter for data storage; and (ii) adjacent CMOS-based logic for read/write operations and data preservation. We compare 3D Stacked Hybrid NEMFET-CMOS Caches (3DS-HNCC) of various capacities against state of the art 45 nm low power CMOS SRAM counterparts (2D-CC). All the proposed implementations provide two orders of magnitude static energy reduction (due to NEMFET's extremely low OFF current), a slightly increased dynamic energy consumption, while requiring an approximately 55% larger footprint. The read access time is equivalent, while for write operations it is with about 3 ns higher, as it is dominated by the mechanical movement of the NEMFET's suspended gate. In order to determine if the write latency overhead inflicts any performance penalty, we consider as evaluation vehicle a state of the art mobile out-of-order processor core equipped with 32-kB instruction and data L1 caches, and a unified 2-MB L2 cache. We evaluate different scenarios, utilizing both 3DS-HNCC and 2D-CC at different hierarchy levels, on a set of SPEC 2000 benchmarks. Our simulations indicate that for the considered applications, despite of their increased write access time, 3DS-HNCC L2 caches inflict insignificant IPC penalty while providing, on average, 38% energy savings, when compared with 2D-CC. For L1 instruction caches the IPC penalty is also almost insignificant, while for L1 data caches IPC decreases between 1% to 12% were measured.","PeriodicalId":6519,"journal":{"name":"2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","volume":"62 1","pages":"151-156"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2770287.2770324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

In this paper we propose to utilise 3D-stacked hybrid memories as alternative to traditional CMOS SRAMs in L1 and L2 cache implementations and analyse the potential implications of this approach on the processor performance, measured in terms of Instructions-per-Cycle (IPC) and energy consumption. The 3D hybrid memory cell relies on: (i) a Short Circuit Current Free Nano-Electro-Mechanical Field Effect Transistor (SCCF NEMFET) based inverter for data storage; and (ii) adjacent CMOS-based logic for read/write operations and data preservation. We compare 3D Stacked Hybrid NEMFET-CMOS Caches (3DS-HNCC) of various capacities against state of the art 45 nm low power CMOS SRAM counterparts (2D-CC). All the proposed implementations provide two orders of magnitude static energy reduction (due to NEMFET's extremely low OFF current), a slightly increased dynamic energy consumption, while requiring an approximately 55% larger footprint. The read access time is equivalent, while for write operations it is with about 3 ns higher, as it is dominated by the mechanical movement of the NEMFET's suspended gate. In order to determine if the write latency overhead inflicts any performance penalty, we consider as evaluation vehicle a state of the art mobile out-of-order processor core equipped with 32-kB instruction and data L1 caches, and a unified 2-MB L2 cache. We evaluate different scenarios, utilizing both 3DS-HNCC and 2D-CC at different hierarchy levels, on a set of SPEC 2000 benchmarks. Our simulations indicate that for the considered applications, despite of their increased write access time, 3DS-HNCC L2 caches inflict insignificant IPC penalty while providing, on average, 38% energy savings, when compared with 2D-CC. For L1 instruction caches the IPC penalty is also almost insignificant, while for L1 data caches IPC decreases between 1% to 12% were measured.

查看原文本刊更多论文

节能3D堆叠混合NEMFET-CMOS缓存

在本文中，我们建议在L1和L2缓存实现中利用3d堆叠混合存储器作为传统CMOS sram的替代方案，并分析这种方法对处理器性能的潜在影响，以每周期指令(IPC)和能耗来衡量。三维混合存储单元依赖于:(i)基于短路无电流纳米机电场效应晶体管(SCCF NEMFET)的逆变器进行数据存储;(ii)相邻的基于cmos的逻辑，用于读写操作和数据保存。我们比较了不同容量的3D堆叠混合NEMFET-CMOS缓存(3D - hncc)和最先进的45纳米低功耗CMOS SRAM (2D-CC)。所有提出的实现都提供了两个数量级的静态能量减少(由于NEMFET的极低OFF电流)，动态能量消耗略有增加，同时需要大约55%的占地面积。读访问时间是相等的，而写操作时间大约高3ns，因为它是由NEMFET的悬挂栅的机械运动支配的。为了确定写延迟开销是否会造成任何性能损失，我们将配备32kb指令和数据L1缓存以及统一的2mb L2缓存的先进移动乱序处理器核心作为评估工具。我们在一组SPEC 2000基准上，利用不同层次的3d - hncc和2D-CC来评估不同的场景。我们的模拟表明，对于考虑的应用程序，尽管它们增加了写访问时间，但与2D-CC相比，3DS-HNCC L2缓存造成的IPC损失微不足道，同时平均节省38%的能源。对于L1指令缓存，IPC损失也几乎微不足道，而对于L1数据缓存，IPC减少了1%到12%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH)

自引率

0.00%

发文量