{"title":"Comprehensive Analysis, Modeling and Design for Hold-Timing Resiliency in Voltage Scalable Design","authors":"Huanyu Wang, Geng Xie, Jie Gu","doi":"10.1145/2934583.2934584","DOIUrl":"https://doi.org/10.1145/2934583.2934584","url":null,"abstract":"Resiliency to timing violation is a crucial requirement for low power electronics operating across a wide range of supply voltages. Although many existing solutions enhance setup timing tolerance for the higher performance, an accurate modeling and design strategy for hold resiliency dealing with conflicting requirement from both high voltages and low voltages has not been established. This paper proposes a novel voltage-scalable modeling technique that leverages conventional static timing analysis and efficient statistical analysis to achieve accurate stochastic hold timing analysis. Several highly nonlinear behaviors of circuit operation are also incorporated into the proposed model to achieve a model accuracy of within 10% of spice Monte-Carlos simulation. Leveraging the proposed modeling technique, a novel hold resilience design technique is proposed to eliminate the excessive hold fixing operation for low voltage operation and its associated performance degradation at high voltage while still being compatible with conventional design closure flow. The proposed design methodology is demonstrated in a 45nm DSP processor design enabling a voltage-scalable operation from 0.35V to 0.9V eliminating more than 20,000 hold buffers as well as 23% performance degradation at high voltages due to hold fixing.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130820009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Voltage Noise Induced DRAM Soft Error Reduction Technique for 3D-CPUs","authors":"Tiantao Lu, Caleb Serafy, Zhiyuan Yang, Ankur Srivastava","doi":"10.1145/2934583.2934589","DOIUrl":"https://doi.org/10.1145/2934583.2934589","url":null,"abstract":"Three-dimensional integration enables stacking DRAM on top of CPU, providing high bandwidth and short latency. However, non-uniform voltage fluctuation and local thermal hotspot in CPU layers are coupled into DRAM layers, causing a non-uniform bit-cell leakage (thereby bit flip) distribution. We propose a performance-power-resilience simulation framework to capture DRAM soft error in 3D multi-core CPU systems. A dynamic resilience management (DRM) scheme is investigated, which adaptively tunes CPU's operating points to adjust DRAM's voltage noise and thermal condition during runtime. The DRM uses dynamic frequency scaling to achieve a resilience borrow-in strategy, which effectively enhances DRAM's resilience without sacrificing performance.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123135457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Novel Technologies & Resilience Design","authors":"Swaroop Ghosh, Tsung-Te Liu","doi":"10.1145/3256011","DOIUrl":"https://doi.org/10.1145/3256011","url":null,"abstract":"","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122196809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Impact of Magnetic and Thermal Attack on STTRAM and Low-Overhead Mitigation Techniques","authors":"Jaedong Jang, Swaroop Ghosh","doi":"10.1145/2934583.2934614","DOIUrl":"https://doi.org/10.1145/2934583.2934614","url":null,"abstract":"In this paper, we analyze the fundamental vulnerabilities of Spin-Torque-Transfer RAM on magnetic field and temperature that can be exploited by adversaries with an intent to trigger soft performance failures. We present novel attack vectors and their impact on memory performance (i.e., read, write and retention). We propose a novel low-overhead clock frequency-adaptation technique to mitigate the attack. Our analysis indicate slowing the clock frequency by 85% restores 170 mV of sense margin under 300 Oe DC magnetic field. In addition, 66% operating clock slowdown allows STTRAM to tolerate over 300 Oe AC magnetic field.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114153059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software Infrastructure for Enabling FPGA-Based Accelerations in Data Centers: Invited Paper","authors":"J. Cong, Muhuan Huang, P. Pan, Di Wu, Peng Zhang","doi":"10.1145/2934583.2953984","DOIUrl":"https://doi.org/10.1145/2934583.2953984","url":null,"abstract":"This paper focuses on the development of an infrastructure to enable FPGA-based acceleration in data centers. We present an initial version of an integrated solution that includes automated compilation for accelerator generation, runtime accelerator resource scheduling and management, and acceleration libraries for FPGA-based customized computing for big data applications. The solution can help overcome some of the main challenges with FPGA-based accelerated computing. It has the potential to bring significant performance and energy efficiency improvement for data center applications.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132943942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Therma: Thermal-aware Run-time Thread Migration for Nanophotonic Interconnects","authors":"Majed Valad Beigi, G. Memik","doi":"10.1145/2934583.2934592","DOIUrl":"https://doi.org/10.1145/2934583.2934592","url":null,"abstract":"In this paper, we introduce Therma, a thermal-aware run-time thread migration mechanism for managing temperature fluctuations in nanophotonic networks. Nanophotonics is one of the most promising communication substrate candidates for next-generation high-performance systems. However, their underlying components are sensitive to temperature fluctuations. These fluctuations arise mostly because of the temperature changes on the cores, which are adjacent to nanophotonic components. Therma minimizes thermal fluctuations on these temperature sensitive components by moving threads across cores. Evaluation results reveal that when each core is executing a single thread, Therma achieves a 15.4% and 6.1% reduction in the photonic power consumption compared to the baseline and an interconnectoblivious thread migration scheme, respectively. It also reduces photonic power consumption by up to 20.7% compared to the alternatives when running multiple threads per core on the system.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133211388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Energy-Efficient PUF Design: Computing While Racing","authors":"Hongxiang Gu, T. Xu, M. Potkonjak","doi":"10.1145/2934583.2934604","DOIUrl":"https://doi.org/10.1145/2934583.2934604","url":null,"abstract":"Physical unclonable functions (PUFs) take advantage of the effect of process variation on hardware to obtain their unclonability. Traditional PUF design only focuses on the analog signals of circuits. An arbiter PUF, for example, generates responses by racing delay signals. Implementations of such PUFs usually employ large area and power consumption while providing very low throughput. To address this problem, we propose an energy efficient PUF design in such a way that it races analog signals and computes digital logic simultaneously. More importantly, the analog portion of the circuit (racing) shares a large amount of hardware resources with the digital portion of the circuit (computing) by introducing only small overhead in terms of area and power. Our test results on Spartan-6 field-programmable gate array (FPGA) platforms indicate that by combining the two outputs, our design enables much larger PUF output throughput, better randomness and less power consumption compared to traditional PUFs.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115539718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low Area, Low Power, Robust, Highly Sensitive Error Detecting Latch for Resilient Architectures","authors":"Weizhe Hua, R. Tadros, P. Beerel","doi":"10.1145/2934583.2934600","DOIUrl":"https://doi.org/10.1145/2934583.2934600","url":null,"abstract":"Operating at lower supply voltages to meet ever-increasing demands for power-efficiency unfortunately aggravates process, voltage, and temperature (PVT) variability. Resilient architectures have emerged as a promising way to mitigate widening worst-case margins at these voltages. In particular, timing resilient architectures use extra circuitry to detect timing violations and recover to its normal operation. The error detecting latch (EDL) is an efficient circuit that helps perform this task. This paper proposes two EDL architectures that achieve as much as 11.2% less power consumption, 20.8% less leakage, 7.8% smaller area, and 18.2% better sensitivity to glitches compared to state-of-the-art EDLs. The paper offers two different flavors trading off robustness for lower power and vice versa. The paper also proposes a comprehensive power metric encapsulating many of the various energy aspects discussed in the literature.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115530189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dissecting Xeon + FPGA: Why the integration of CPUs and FPGAs makes a power difference for the datacenter: Invited Paper","authors":"H. Schmit, Randy Huang","doi":"10.1145/2934583.2953983","DOIUrl":"https://doi.org/10.1145/2934583.2953983","url":null,"abstract":"Intel's Xeon roadmap includes package-integrated FPGAs in every new generation. In this talk, we will dissect why this is such a powerful combination at this time of great change in datacenter workloads. We will show how power savings within the CPU complex is a significant multiplier for power savings in the datacenter as a whole. Focusing on the domain of machine learning, we will present the recent evolution of data types and operators, and make the case that FPGAs are the path to facilitate this continued evolution. Finally, we will discuss the criticality of the close coupling of the CPU and the FPGA. This coupling facilitates high bandwidth and low latency communication that is required for the development, debugging and deployment of heterogeneous applications.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123098522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Auto-Tuning of Synthesis Parameters for Optimizing High-Performance Processors","authors":"M. Ziegler, Hung-Yi Liu, L. Carloni","doi":"10.1145/2934583.2934620","DOIUrl":"https://doi.org/10.1145/2934583.2934620","url":null,"abstract":"Modern logic and physical synthesis tools provide numerous options and parameters that can drastically impact design quality; however, the large number of options leads to a complex design space difficult for human designers to navigate. By employing intelligent search strategies and parallel computing we can tackle this parameter tuning problem, thus automating one of the key design tasks conventionally performed by a human designer. In this paper we present a novel learning-based algorithm for synthesis parameter optimization. This new algorithm has been integrated into our existing autonomous parameter-tuning system, which was used to design multiple 22nm industrial chips and is currently being used for 14nm chips. These techniques show, on average, over 40% reduction in total negative slack and over 10% power reduction across hundreds of 14nm industrial processor macros while reducing overall human design effort. We also present a new higher-level system that manages parameter tuning of multiple designs in a scalable way. This new system addresses the needs of large design teams by prioritizing the tuning effort to maximize returns given the available compute resources.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123287530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}