Shuo-Han Chen, Yuan-Hao Chang, Tseng-Yi Chen, Yu-Ming Chang, Pei-Wen Hsiao, H. Wei, W. Shih
{"title":"Enhancing the Energy Efficiency of Journaling File System via Exploiting Multi-Write Modes on MLC NVRAM","authors":"Shuo-Han Chen, Yuan-Hao Chang, Tseng-Yi Chen, Yu-Ming Chang, Pei-Wen Hsiao, H. Wei, W. Shih","doi":"10.1145/3218603.3218632","DOIUrl":"https://doi.org/10.1145/3218603.3218632","url":null,"abstract":"Non-volatile random-access memory (NVRAM) is regarded as a great alternative storage medium owing to its attractive features, including low idle energy consumption, byte addressability, and short read/write latency. In addition, multi-level-cell (MLC) NVRAM has also been proposed to provide higher bit density. However, MLC NVRAM has lower energy efficiency and longer write latency when compared with single-level-cell (SLC) NVRAM. These drawbacks could lead to higher energy consumption of MLC NVRAM-based storage systems. The energy consumption is magnified by existing journaling file systems (JFS) on MLC NVRAM-based storage devices due to the JFS's fail-safe policy of writing the same data twice. Such observations motivate us to propose a multi-write-mode journaling file systems (mwJFS) to alleviate the drawbacks of MLC NVRAM and lower the energy consumption of MLC NVRAM-based JFS. The proposed mwJFS differentiates the data retention requirement of journaled data and applies different write modes to enhance the energy efficiency with better access performance. A series of experiments was conducted to demonstrate the capability of mwJFS on a MLC NVRAM-based storage system.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82512930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling Intra-Plane Parallel Block Erase in NAND Flash to Alleviate the Impact of Garbage Collection","authors":"Tyler Garrett, Jun Yang, Youtao Zhang","doi":"10.1145/3218603.3218627","DOIUrl":"https://doi.org/10.1145/3218603.3218627","url":null,"abstract":"Garbage collection (GC) in NAND flash can significantly decrease I/O performance in SSDs by copying valid data to other locations, thus blocking incoming I/O requests. To help improve performance, NAND flash utilizes various advanced commands to increase internal parallelism. Currently, these commands only parallelize operations across channels, chips, dies, and planes, neglecting the block level due to risk of disturbances that can compromise valid data by inducing errors. However, due to the triple-well structure of the NAND flash plane architecture, it is possible to erase multiple blocks within a plane, in parallel, without diminishing the integrity of the valid data. The number of page movements due to multiple block erases can be restrained so as to bound the overhead per GC. Moreover, more capacity can be reclaimed per GC which delays future GCs and effectively reduces their frequency. Such an Intra-Plane Parallel Block Erase (IPPBE) in turn diminishes the impact of GC on incoming requests, improving their response times. Experimental results show that IPPBE can reduce the time spent performing GC by up to 50.7% and 33.6% on average, read/write response time by up to 47.0%/45.4% and 16.5%/14.8% on average respectively, page movements by up to 52.2% and 26.6% on average, and blocks erased by up to 14.2% and 3.6% on average. An energy analysis conducted indicates that by reducing the number of page copies and the number of block erases, the energy cost of garbage collection can be reduced up to 44.1% and 19.3% on average.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78648971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jae-Whan Lee, Joo-Hyung Chae, Jihwan Park, Hyunkyu Park, Jaekwang Yun, Suhwan Kim
{"title":"Energy-Efficient Dynamic Comparator with Active Inductor for Receiver of Memory Interfaces","authors":"Jae-Whan Lee, Joo-Hyung Chae, Jihwan Park, Hyunkyu Park, Jaekwang Yun, Suhwan Kim","doi":"10.1145/3218603.3218620","DOIUrl":"https://doi.org/10.1145/3218603.3218620","url":null,"abstract":"In this paper, we propose a dynamic comparator that improved the operation performance of receiver (RX) with the effort to reduce power consumption. It is implemented via double-tail StrongARM latch comparator with an active inductor and efforts are made to minimize power consumption for high-speed resulting in better energy efficiency at the targeted high frequency. In this regard, our comparator is suitable for memory application RX to satisfy both low-power and high-speed. It is applied to the single-ended RX designed with a continuous-time linear equalizer, a clock generator and a quarter-rate 2-tap decision-feedback equalizer which is appropriate for the high-frequency memory application. Compared to the conventional one, our design, fabricated in 55nm CMOS process, provides an improvement of 7% in unit interval (UI) margin under the same power consumption and receives up to 10Gb/s PRBS15 data at BER < 10-12 with 0.4 UI margin and energy efficiency of 0.67pJ/bit.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82880420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Value-driven Synthesis for Neural Network ASICs","authors":"Zhiyuan Yang, Ankur Srivastava","doi":"10.1145/3218603.3218634","DOIUrl":"https://doi.org/10.1145/3218603.3218634","url":null,"abstract":"In order to enable low power and high performance evaluation of neural network (NN) applications, we investigate new design methodologies for synthesizing neural network ASICs (NN-ASICs). An NN-ASIC takes a trained NN and implements a chip with customized optimization. Knowing the NN topology and weights allows us to develop unique optimization schemes which are not available to regular ASICs. In this work, we investigate two types of value-driven optimized multipliers which exploit the knowledge of synaptic weights and we develop an algorithm to synthesize the multiplication of trained NNs using these special multipliers instead of general ones. The proposed method is evaluated using several Deep Neural Networks. Experimental results demonstrate that compared to traditional NNPs, our proposed NN-ASICs can achieve up to 6.5x and 55x improvement in performance and energy efficiency (i.e. inverse of Energy-Delay-Product), respectively.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76928671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yulhwa Kim, Hyungjun Kim, Daehyun Ahn, Jae-Joon Kim
{"title":"Input-Splitting of Large Neural Networks for Power-Efficient Accelerator with Resistive Crossbar Memory Array","authors":"Yulhwa Kim, Hyungjun Kim, Daehyun Ahn, Jae-Joon Kim","doi":"10.1145/3218603.3218605","DOIUrl":"https://doi.org/10.1145/3218603.3218605","url":null,"abstract":"Resistive Crossbar memory Arrays (RCA) have been gaining interest as a promising platform to implement Convolutional Neural Networks (CNN). One of the major challenges in RCA-based design is that the number of rows in an RCA is often smaller than the number of input neurons in a layer. Previous works used high-resolution Analog-to-Digital Converters (ADCs) to compute the partial weighted sum in each array and merged partial sums from multiple arrays outside the RCAs. However, such approach suffers from significant power consumption due to the need for high-resolution ADCs. In this paper, we propose a methodology to more efficiently construct a large CNN with multiple RCAs. By splitting the input feature map and retraining the CNN with proper initialization, we demonstrate that any CNN model can be represented with multiple arrays without using intermediate partial sums. The experimental results show that the ADC power of the proposed design is 32x smaller and the total chip power of the proposed design is 3x smaller than those of the baseline design.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86272574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Energy-Efficient High-Swing PAM-4 Voltage-Mode Transmitter","authors":"Lejie Lu, Yong Wang, Hui Wu","doi":"10.1145/3218603.3218651","DOIUrl":"https://doi.org/10.1145/3218603.3218651","url":null,"abstract":"As the data rate of high-speed I/Os continues to increase, four-level pulse amplitude modulation (PAM-4) is adopted to improve the bandwidth density and link margin at 50 Gb/s and beyond. Compared to non-return-to-zero (NRZ) signaling, however, the PAM-4 eye height is reduced, which calls for larger transmitter swing to maintain signal-to-noise-ratio. A new energy-efficient transmitter is proposed to generate large swing PAM-4 signals with a cascode voltage-mode driver and supporting pre-drivers and logic circuits. By reconfiguring the pull-up and pull-down branches based on the transmit data and steering the bypass currents, the proposed voltage-mode driver significantly reduces power consumption compared to conventional implementation while maintaining impedance matching. Voltage stacking technique is adopted for pre-drivers to further improve energy efficiency. To demonstrate the new transmitter design, a prototype 56 Gb/s PAM-4 transmitter is designed using a generic 28-nm CMOS technology with a 2-V power supply voltage. It achieves a overall output swing of 2 V and a minimum eye height of 490 mV with good linearity (98.7% level separation mismatch ratio). Compared to a conventional voltage-mode transmitter design with the same swing, the static power consumption of the new transmitter is reduced almost by half (from 30 mW to 16 mW), and its overall energy efficiency improves from 0.7 pJ/b to 0.5 pJ/b.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85937964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scheduling of Hybrid Battery-Supercapacitor Control Instructions for Longevity in Systems with Power Gating","authors":"Sumanta Pyne","doi":"10.1145/3218603.3218609","DOIUrl":"https://doi.org/10.1145/3218603.3218609","url":null,"abstract":"The in-rush current due to wake-up of power gating (PG) components causes faster discharge of battery. This work introduces an instruction controlled hybrid battery-supercapacitor (B-SC) system for longer battery life in systems with instruction controlled PG. Two instructions have been introduced along with architectural support. The first instruction disconnects the battery from the PG components if the charge in the supercapacitor greater than or equal to the charge required by wake-up of PG components. The other instruction connects the battery to the PG components for recharging the supercapacitor. Disconnecting the battery during wake-up minimizes rate capacity effect (C-rate) for longer battery life. An algorithm is designed to schedule the proposed battery control instructions within a program having PG instructions. The efficacy of the proposed method is evaluated on MiBench and MediaBench benchmark programs. The proposed method reduces C-rate by an average of 14.25% at the cost of average performance loss of 6.87%.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85983472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Threshold Defined Camouflaged Gates in 65nm Technology for Reverse Engineering Protection","authors":"Anirudh Iyengar, Deepak Vontela, Ithihasa Reddy Nirmala, Swaroop Ghosh, Seyedhamidreza Motaman, Jaedong Jang","doi":"10.1145/3218603.3218641","DOIUrl":"https://doi.org/10.1145/3218603.3218641","url":null,"abstract":"Due to the ever-increasing threat of Reverse Engineering (RE) of Intellectual Property (IP) for malicious gains, camouflaging of logic gates is becoming very important. In this paper, we present experimental demonstration of transistor threshold voltage-defined switch [2] based camouflaged logic gates that can hide six logic functionalities i.e. NAND, AND, NOR, OR, XOR and XNOR. The proposed gates can be used to design the IP, forcing an adversary to perform brute-force guess-and-verify of the underlying functionality---increasing the RE effort. We propose two flavors of camouflaging, one employing only a pass transistor (NMOS-switch) and the other utilizing a full pass transistor (CMOS-switch). The camouflaged gates are used to design Ring-Oscillators (RO) in ST 65nm technology, one for each functionality, on which we have performed temperature, voltage, and process-variation analysis. We observe that CMOS-switch based camouflaged gate offers a higher performance (~1.5-8X better) than NMOS-switch based gate at an added area cost of only 5%. The proposed gates show functionality till 0.65V. We are also able to reclaim lost performance by dynamically changing the switch gate voltage and show that robust operation can be achieved at lower voltage and under temperature fluctuation.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91245089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Information Leakage Attacks on Emerging Non-Volatile Memory and Countermeasures","authors":"Mohammad Nasim Imtiaz Khan, Swaroop Ghosh","doi":"10.1145/3218603.3218649","DOIUrl":"https://doi.org/10.1145/3218603.3218649","url":null,"abstract":"Emerging Non-Volatile Memories (NVMs) suffer from high and asymmetric read/write current and long write latency which can result in supply noise, such as supply voltage droop and ground bounce. The magnitude of supply noise depends on the old data and the new data that is being written (for a write operation) or on the stored data (for a read operation). Therefore, victim's write operation creates a supply noise which propagates to adversary's memory space. The adversary can detect victim's write initiation and can leverage faster read latency (compared to write) to further sense the Hamming Weight (HW) of the victim's write data by detecting read failures in his memory space. These attacks are specifically possible if exhaustive testing of the memory for all patterns, all possible location combinations, all possible parallel read/write conditions are not performed under bit-to-bit process variations and specified (-10°C to 90°C) and unspecified temperature ranges (i.e., less than -10°C and greater than 90°C). Simulation result indicates that adversary can sense HW of victim's (near-by) write data = 66.77%, and further narrow the range based on read/write failure characteristics. Side Channel Attacks can utilize this information to strengthen the attacks.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78762513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Intrinsic and Database-free Watermarking in ICs by Exploiting Process and Design Dependent Variability in Metal-Oxide-Metal Capacitances","authors":"A. Shylendra, S. Bhunia, A. Trivedi","doi":"10.1145/3218603.3218608","DOIUrl":"https://doi.org/10.1145/3218603.3218608","url":null,"abstract":"Authentication of integrated circuits (IC) to verify their integrity has emerged as a critical need to address increasing concerns associated with counterfeit ICs in the supply chain. In this paper, novel SAR-ADC based intrinsic and database-free authentication scheme has been proposed. Proposed technique utilizes mismatch in back end of line (BEOL) capacitors used in charge-redistribution SAR ADC to generate authentication signature. BEOL metal-oxide-metal (MOM) capacitors form a reliable source of process variation information and are less sensitive to aging & temperature induced variations. Line edge roughness is the primary source of mismatch in BEOL capacitors and thus, capacitor mismatch variation has been analyzed in terms of LER and geometric parameters. Resource overhead incurred by the proposed modifications to the ADC architecture to incorporate authentication ability is minimal and existing on-chip calibration circuitry is used to extract signature. Proposed technique does not require sophisticated test setup, thereby, simplifying the authentication procedure.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87486496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}