{"title":"Exploring configurable non-volatile memory-based caches for energy-efficient embedded systems","authors":"Tosiron Adegbija","doi":"10.1145/2902961.2903009","DOIUrl":"https://doi.org/10.1145/2902961.2903009","url":null,"abstract":"Non-volatile memory (NVM) technologies have recently emerged as alternatives to traditional SRAM-based cache memories, since NVMs offer advantages such as non-volatility, low leakage power, fast read speed, and high density. However, NVMs also have disadvantages, such as high write latency and energy, which necessitate further research into robust optimization techniques. In this paper, we propose and evaluate configurable non-volatile memories (configNVM) as a viable NVM optimization technique, and show that configNVMs can reduce the cache's energy consumption by up to 60%, with minimal performance degradation. We describe the knowledge gaps that must be filled to enable configNVMs, and show that configNVMs offer new opportunities for energy efficient caching in embedded systems.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"356 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132894589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graphene-PLA (GPLA): A compact and ultra-low power logic array architecture","authors":"V. Tenace, A. Calimera, E. Macii, M. Poncino","doi":"10.1145/2902961.2902970","DOIUrl":"https://doi.org/10.1145/2902961.2902970","url":null,"abstract":"The key characteristics of the next generation of ICs for wearable applications include high integration density, small area, low power consumption, high energy-efficiency, reliability and enhanced mechanical properties like stretchability and transparency. The proper mix of new materials and novel integration strategies is the enabling factor to achieve those design specifications. Moving toward this goal, we introduce a graphene-based regular logic-array structure for energy efficient digital computing. It consists of graphene p-n junctions arranged into a regular mesh. The obtained structure resembles that of Programmable Logic Arrays (PLAs), hence the name Graphene-PLAs (GPLAs); the high expressive power of graphene p-n junctions and their resistive nature enables the implementation of ultra-low power adiabatic logic circuits.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130532115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xijing Han, M. Donato, R. I. Bahar, A. Zaslavsky, W. Patterson
{"title":"Design of error-resilient logic gates with reinforcement using implications","authors":"Xijing Han, M. Donato, R. I. Bahar, A. Zaslavsky, W. Patterson","doi":"10.1145/2902961.2902983","DOIUrl":"https://doi.org/10.1145/2902961.2902983","url":null,"abstract":"Operating circuits in the sub-threshold region can save power, but at the cost of higher susceptibility to noise. This paper analyzes various gate-level error-mitigation designs appropriate for sub-threshold circuits. Previous works have proposed a modified version of the Schmitt trigger gate that uses logic implications to reinforce correct functional behavior. However, the increased error resilience requires increased area, delay, and power overhead. To address these shortcomings, we introduce two alternative and less costly approaches to reinforcing correct logic behavior via implications. In addition, to provide more flexibility in implication selection, we consider not just simple implications that reinforce relationships between two signals, but also more complex 3-signal implications within the circuit. Our simulation results demonstrate that these alternative gate structures can outperform the Schmitt trigger version as long as the noise on the reinforcement signals themselves is sufficiently low.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115115045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chaofei Yang, Beiye Liu, Yandan Wang, Yiran Chen, Hai Helen Li, Xian Zhang, Guangyu Sun
{"title":"The applications of NVM technology in hardware security","authors":"Chaofei Yang, Beiye Liu, Yandan Wang, Yiran Chen, Hai Helen Li, Xian Zhang, Guangyu Sun","doi":"10.1145/2902961.2903043","DOIUrl":"https://doi.org/10.1145/2902961.2903043","url":null,"abstract":"The emerging nonvolatile memory (NVM) technologies have demonstrated great potentials in revolutionizing modern memory hierarchy because of their many promising properties: nanosecond read/write time, small cell area, non-volatility, and easy CMOS integration. It is also found that NVM devices can be leveraged to realize some hardware security solutions efficiently, such as physical unclonable function (PUF) and random number generator (RNG). In this paper, we summarize two of our works about using NVM devices to implement these hardware security features and compare them with conventional designs.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121280930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FCM: Towards fine-grained GPU power management for closed source mobile games","authors":"Jiachen Song, Xi Li, Beilei Sun, Zhinan Cheng, Chao Wang, Xuehai Zhou","doi":"10.1145/2902961.2902989","DOIUrl":"https://doi.org/10.1145/2902961.2902989","url":null,"abstract":"Contemporary mobile platforms employ embedded graphic processing units (GPUs) for graphics-intensive games, and dynamic voltage and frequency scaling (DVFS) policies are used to save energy without sacrificing quality. However, current GPU DVFS policies result in unnecessary power waste due to defective workload estimations of embedded GPUs during game play. In this paper, we propose the Frame-Complexity Model (FCM), a fine-grained estimation of the GPU workload in a game frame, to quantify the GPU workload with the real runtime demand for GPU computing resources of a game frame. In FCM, three constituents of a game frame (i.e., structure, textures and computation) are quantified without modification of mobile games. Preliminary experiments show that, compared with the default policy, the FCM-directed GPU DVFS policy can reduce more power consumption of games (11.3% to 25.8%) with good Quality of Service (QoS).","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116040098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hang Zhang, Xuhao Chen, Nong Xiao, Fang Liu, Zhiguang Chen
{"title":"Red-shield: Shielding read disturbance for STT-RAM based register files on GPUs","authors":"Hang Zhang, Xuhao Chen, Nong Xiao, Fang Liu, Zhiguang Chen","doi":"10.1145/2902961.2902988","DOIUrl":"https://doi.org/10.1145/2902961.2902988","url":null,"abstract":"To address the high energy consumption issue of SRAM on GPUs, emerging Spin-Transfer Torque (STT-RAM) memory technology has been intensively studied to build GPU register files for better energy-efficiency, thanks to its benefits of low leakage power, high density, and good scalability. However, STT-RAM suffers from a reliability issue, read disturbance, which stems from the fact that the voltage difference between read current and write current becomes smaller as technology scales. The read disturbance leads to high error rates for read operations, which cannot be effectively protected by SECDEC ECC on large-capacity register files of GPUs. Prior schemes (e.g. read-restore) to mitigate the read disturbance usually incur either non-trivial performance loss or excessive energy overhead, thus not applicable for the GPU register file design which aims to achieve both high performance and energy-efficiency. To combat the read disturbance on GPU register files, we propose a novel software-hardware co-designed solution, i.e. Red-Shield, which consists of three optimizations to overcome limitations of the existing solutions. First, we identify dead reads at compiling stage and augment instructions to avoid unnecessary restores. Second, we employ a small read buffer to accommodate register reads with high access locality to further reduce restores. Third, we propose an adaptive restore mechanism to selectively pick the suitable restore scheme, according to the busy status of corresponding register banks. Experimental results show that our proposed design can effectively mitigate the performance loss and energy overhead caused by restore operations, while still maintaining the reliability of reads.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128754188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reduced overhead gate level logic encryption","authors":"Kyle Juretus, I. Savidis","doi":"10.1145/2902961.2902972","DOIUrl":"https://doi.org/10.1145/2902961.2902972","url":null,"abstract":"Untrusted third-parties are found throughout the integrated circuit (IC) design flow resulting in potential threats in IC reliability and security. Threats include IC counterfeiting, intellectual property (IP) theft, IC overproduction, and the insertion of hardware Trojans. Logic encryption has emerged as a method of enhancing security against such threats, however, current implementations of logic encryption, including the XOR or look-up table (LUT) techniques, have high per-gate overheads in area, performance, and power. A novel gate level logic encryption technique with reduced per-gate overheads is described in this paper. In addition, a technique to expand the search space of a key sequence is provided, increasing the difficulty for an adversary to extract the key value. A power reduction of 41.50%, an estimated area reduction of 43.58%, and a performance increase of 34.54% is achieved when using the proposed gate level logic encryption instead of the LUT based technique for an encrypted AND gate.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127069596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time hardware stereo matching using guided image filter","authors":"Chen-Yu Yang, Yan Li, Wei Zhong, Song Chen","doi":"10.1145/2902961.2902995","DOIUrl":"https://doi.org/10.1145/2902961.2902995","url":null,"abstract":"Stereo matching is a key step in stereo vision systems that require high accurate depth information and real-time processing of high definition image streams. This work presents a high-accuracy hardware implementation for the stereo matching based on the guided image filter, which is an edge-preserving filter and simplifies the adaptive support window algorithm. The coefficients in the guided image filter are calculated by the proposed mean filter tree structure, which saves hardware resources by sharing large amounts of additions among filter operations. The reference image is enhanced using Laplacian Filter, which improves the accuracy for the discontinuous disparity regions. Moreover, an 8×8 matching window and customized ping-pong caches are used to improve the whole throughputs. The proposed hardware architecture is implemented on a Cyclone IV FPGA resulting in a throughput of 1080p resolution images at 80fps with high accuracy of disparity.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122280192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring main memory design based on racetrack memory technology","authors":"Qingda Hu, Guangyu Sun, J. Shu, Chao Zhang","doi":"10.1145/2902961.2902967","DOIUrl":"https://doi.org/10.1145/2902961.2902967","url":null,"abstract":"Emerging non-volatile memories (NVMs), which include PC-RAM and STT-RAM, have been proposed to replace DRAM, mainly because they have better scalability and lower standby power. However, previous research has demonstrated that these NVMs cannot completely replace DRAM due to either lifetime/performance (PCRAM) or density (STT-RAM) issues. Recently, a new type of emerging NVM, called Racetrack Memory (RM), has attracted more and more attention of memory researchers because it has ultra-high density and fast access speed without the write cycle issue. However, there lacks research on how to leverage RM for main memory. To this end, we explore main memory design based on RM technology in both circuit and architecture levels. In the circuit level, we propose the structure of the RM based main memory and investigate different design parameters. In the architecture level, we design a simple and efficient shift-sense address mapping policy to reduce 95% shift operations for performance improvement and power saving. At the same time, we analyze the efficiency of existing optimization strategies for NVM main memory. Our experiments show that RM can outperform DRAM for main memory, in respect of density, performance, and energy efficiency.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117258787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Aprile, Luca Baldassarre, Vipul Gupta, Juhwan Yoo, Mahsa Shoaran, Y. Leblebici, V. Cevher
{"title":"Learning-based near-optimal area-power trade-offs in hardware design for neural signal acquisition","authors":"C. Aprile, Luca Baldassarre, Vipul Gupta, Juhwan Yoo, Mahsa Shoaran, Y. Leblebici, V. Cevher","doi":"10.1145/2902961.2903028","DOIUrl":"https://doi.org/10.1145/2902961.2903028","url":null,"abstract":"Wireless implantable devices capable of monitoring the electrical activity of the brain are becoming an important tool for understanding and potentially treating mental diseases such as epilepsy and depression. While such devices exist, it is still necessary to address several challenges to make them more practical in terms of area and power dissipation. In this work, we apply Learning Based Compressive Sub-sampling (LBCS) to tackle the power and area trade-offs in neural wireless devices. To this end, we propose a low-power and area-efficient system for neural signal acquisition which yields state-of-art compression rates up to 64× with high reconstruction quality, as demonstrated on two human iEEG datasets. This new fully digital architecture handles one neural acquisition channel, with an area of 210 × 210μm in 90nm CMOS technology, and a power dissipation of only 1μW.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134139852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}