Fast pipelined storage for high-performance energy-efficient computing with superconductor technology

2015 12th International Conference & Expo on Emerging Technologies for a Smarter World (CEWIT) Pub Date : 2015-12-03 DOI:10.1109/CEWIT.2015.7338159

M. Dorojevets, Zuoting Chen

{"title":"Fast pipelined storage for high-performance energy-efficient computing with superconductor technology","authors":"M. Dorojevets, Zuoting Chen","doi":"10.1109/CEWIT.2015.7338159","DOIUrl":null,"url":null,"abstract":"New superconductor single flux quantum (SFQ) technology, such as Reciprocal Quantum Logic (RQL), is currently considered one of the promising candidates for highperformance energy-efficient computing. This paper presents our work on the design and detailed energy efficiency analysis of three types of 32- and 64-bit RQL multi-ported pipelined local storage structures (13 total), namely 1) random access memory (RAM) and register files, 2) direct-mapped write-through and write-back caches, and 3) first-in-first-out (FIFO) buffers. Our layout-aware cell-level design process uses a VHDL RQL cell library developed at the Ultra High Speed Computing Laboratory at Stony Brook University (SBU). The SBU VHDL RQL cell library specifies the dynamic and standby energy consumption, gate delays, a number of Josephson junctions (JJs) per cell, and approximate sizes of individual cells based on the parameters of the 248 nm 100 μA/μm2 10 Nb metal layer SFQ fabrication process currently under development at the MIT Lincoln Laboratory. Gate and wire delays as well as clock skew are taken into account during digital circuit simulation done with Mentor Graphics CAD tools. After completing a physical chip layout, the circuit models need to be updated and re-simulated to include the effects of parasitic inductances and actual wire lengths on signal propagation delays. To meet both performance and energy efficiency targets, the RQL storage structures were designed with RQL non-destructive read-out single-bit storage cells. We chose a relatively moderate clock frequency of 8.5 GHz for all storage units to keep their read latencies in the range of 1- 3 cycles. The most complex design in terms of JJs is a tripleported 4 Kbit 64x64-bit register file with 253,918 JJs and its read access latency of 338 ps. The highest energy consumption in terms of energy/operation/bit (~9.5 aJ at 4.2 K) is for a write hit in a 2 Kbit 32-bit wide write-back cache. The average energy consumption of the RQL storage designs varies from ~1.6 aJ/operation/bit for a small 4x32-bit FIFO to 7.3 aJ/operation/bit for the 2 Kbit write-back cache at 4.2 K. Given the cryocooler efficiency of 0.1%, this means the energy consumption of ~1.6-7.3 fJ/operation/bit at room temperature. The physical implementation of the RQL storage units will become feasible upon the development of the target MIT fabrication process and CAD tools for VLSI RQL chip design in 2015-2016.","PeriodicalId":153787,"journal":{"name":"2015 12th International Conference & Expo on Emerging Technologies for a Smarter World (CEWIT)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 12th International Conference & Expo on Emerging Technologies for a Smarter World (CEWIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEWIT.2015.7338159","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

New superconductor single flux quantum (SFQ) technology, such as Reciprocal Quantum Logic (RQL), is currently considered one of the promising candidates for highperformance energy-efficient computing. This paper presents our work on the design and detailed energy efficiency analysis of three types of 32- and 64-bit RQL multi-ported pipelined local storage structures (13 total), namely 1) random access memory (RAM) and register files, 2) direct-mapped write-through and write-back caches, and 3) first-in-first-out (FIFO) buffers. Our layout-aware cell-level design process uses a VHDL RQL cell library developed at the Ultra High Speed Computing Laboratory at Stony Brook University (SBU). The SBU VHDL RQL cell library specifies the dynamic and standby energy consumption, gate delays, a number of Josephson junctions (JJs) per cell, and approximate sizes of individual cells based on the parameters of the 248 nm 100 μA/μm2 10 Nb metal layer SFQ fabrication process currently under development at the MIT Lincoln Laboratory. Gate and wire delays as well as clock skew are taken into account during digital circuit simulation done with Mentor Graphics CAD tools. After completing a physical chip layout, the circuit models need to be updated and re-simulated to include the effects of parasitic inductances and actual wire lengths on signal propagation delays. To meet both performance and energy efficiency targets, the RQL storage structures were designed with RQL non-destructive read-out single-bit storage cells. We chose a relatively moderate clock frequency of 8.5 GHz for all storage units to keep their read latencies in the range of 1- 3 cycles. The most complex design in terms of JJs is a tripleported 4 Kbit 64x64-bit register file with 253,918 JJs and its read access latency of 338 ps. The highest energy consumption in terms of energy/operation/bit (~9.5 aJ at 4.2 K) is for a write hit in a 2 Kbit 32-bit wide write-back cache. The average energy consumption of the RQL storage designs varies from ~1.6 aJ/operation/bit for a small 4x32-bit FIFO to 7.3 aJ/operation/bit for the 2 Kbit write-back cache at 4.2 K. Given the cryocooler efficiency of 0.1%, this means the energy consumption of ~1.6-7.3 fJ/operation/bit at room temperature. The physical implementation of the RQL storage units will become feasible upon the development of the target MIT fabrication process and CAD tools for VLSI RQL chip design in 2015-2016.

查看原文本刊更多论文

利用超导体技术实现高性能节能计算的快速流水线存储

新型超导体单通量量子(SFQ)技术，如互反量子逻辑(RQL)，目前被认为是高性能节能计算的有前途的候选者之一。本文介绍了我们对三种类型的32位和64位RQL多端口流水线本地存储结构(共13种)的设计和详细能效分析的工作，即1)随机存取存储器(RAM)和寄存器文件，2)直接映射的write-through和write-back缓存，以及3)先进先出(FIFO)缓冲区。我们的布局感知单元级设计过程使用了由石溪大学(SBU)超高速计算实验室开发的VHDL RQL单元库。SBU VHDL RQL单元库根据麻省理工学院林肯实验室目前正在开发的248nm 100 μA/μm2 10nb金属层SFQ制造工艺的参数，指定了动态和待机能耗、栅极延迟、每个单元的约瑟夫森结(JJs)数量和单个单元的大致尺寸。在使用Mentor Graphics CAD工具进行数字电路仿真时，考虑了门和线延迟以及时钟偏差。在完成物理芯片布局后，需要更新电路模型并重新模拟，以包括寄生电感和实际导线长度对信号传播延迟的影响。为了同时满足性能和能效目标，RQL存储结构采用了RQL无损读出单比特存储单元。我们为所有存储单元选择了相对适中的8.5 GHz时钟频率，以使其读取延迟保持在1- 3个周期的范围内。就jj而言，最复杂的设计是一个三端口的4 Kbit 64x64位寄存器文件，具有253,918个jj，其读访问延迟为338 ps。就能量/操作/位(4.2 K时约9.5 aJ)而言，最高的能耗是在2 Kbit 32位宽回写缓存中进行写操作。RQL存储设计的平均能耗从小型4x32位FIFO的1.6 aJ/操作/位到4.2 K的2kbit回写缓存的7.3 aJ/操作/位不等。考虑到制冷机效率为0.1%，这意味着在室温下，每次操作/钻头的能耗约为1.6-7.3 fJ。RQL存储单元的物理实现将在2015-2016年开发出用于VLSI RQL芯片设计的目标MIT制造工艺和CAD工具后变得可行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 12th International Conference & Expo on Emerging Technologies for a Smarter World (CEWIT)

自引率

0.00%

发文量