Byoungchan Oh, Nilmini Abeyratne, Jeongseob Ahn, R. Dreslinski, T. Mudge
{"title":"Enhancing DRAM Self-Refresh for Idle Power Reduction","authors":"Byoungchan Oh, Nilmini Abeyratne, Jeongseob Ahn, R. Dreslinski, T. Mudge","doi":"10.1145/2934583.2934632","DOIUrl":"https://doi.org/10.1145/2934583.2934632","url":null,"abstract":"DRAM can enter self-refresh mode to save power during idle periods. But self-refresh mode does not modify or reduce the number of refresh operations, therefore the refresh energy stays the same. We observe that in the self-refresh mode DRAM cells are in two distinct modes, static (idle) and dynamic (refreshing), and that the switching between these modes are predictable. In this paper, we propose two new self-refresh modes to improve the power efficiency of DRAM: Enhanced Self-Refresh (ESR) and Long latency Self-Refresh (LSR). The key idea behind our observation is to optimize the leakage current of DRAM cells by selectively applying different voltage levels to the DRAM cell transistors when they are active (accessed for refreshing) and idle (pre-charged) by adjusting both the word-line and body voltages. With our techniques, the retention time of DRAM cells is improved. In our SPICE and mathematical models, ESR and LSR modes result in a 39% and 48% DRAM self-refresh power reduction compared to the existing self-refresh mode, respectively. A workload analysis of ESR shows DRAM energy savings on average of 22%. In addition, for the long idle periods in server systems, the LSR mode can reduce DRAM idle power by nearly 50%, which results in a 6.5% total system idle power reduction.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128328130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Speeding up Convolutional Neural Network Training with Dynamic Precision Scaling and Flexible Multiplier-Accumulator","authors":"Taesik Na, S. Mukhopadhyay","doi":"10.1145/2934583.2934625","DOIUrl":"https://doi.org/10.1145/2934583.2934625","url":null,"abstract":"Training convolutional neural network is a major bottleneck when developing a new neural network topology. This paper presents a dynamic precision scaling (DPS) algorithm and flexible multiplier-accumulator (MAC) to speed up convolutional neural network training. The DPS algorithm utilizes dynamic fixed point and finds good enough numerical precision for target network while training. The precision information from DPS is used to configure our proposed MAC. The proposed MAC can perform fixed point computation with variable precision mode providing differentiated computation time which enables speeding up training for lower precision computation. Simulation results show that our work can achieve 5.7x speed-up while consuming 31% energy compared to baseline for modified Alexnet on Flickr image style recognition task.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"132 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132781598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Design Methodology for 3D IC","authors":"Liangzhen Lai, M. Ziegler","doi":"10.1145/3256014","DOIUrl":"https://doi.org/10.1145/3256014","url":null,"abstract":"","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131696758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bit Serializing a Microprocessor for Ultra-low-power","authors":"Matthew Tomei, Henry Duwe, N. Kim, Rakesh Kumar","doi":"10.1145/2934583.2934597","DOIUrl":"https://doi.org/10.1145/2934583.2934597","url":null,"abstract":"Many emerging sensor applications are powered by energy harvesters that impose strict power constraints. These applications often do not require high performance or energy efficiency. We explore a technique for minimizing power of a microprocessor for power constrained applications: bit serial computing. Bit serial computing promises power benefits up to the data width for fully bit serializable logic. We perform a best-effort bit serialization of the openMSP430 microprocessor without making instruction set architecture (ISA) modifications. Although it is very challenging to serialize much of the logic in the microprocessor, we show that power benefits of serialization exceed 42% when the serial and parallel designs synthesized for their maximum operating frequency are running at a low duty cycle. Benefits are expected to be higher when ISA modifications are allowed.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123928693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Architectures for Approximate Computing","authors":"Xi Chen, S. Parameswaran","doi":"10.1145/3256019","DOIUrl":"https://doi.org/10.1145/3256019","url":null,"abstract":"","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117287777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ACAM: Approximate Computing Based on Adaptive Associative Memory with Online Learning","authors":"M. Imani, Yeseong Kim, Abbas Rahimi, T. Simunic","doi":"10.1145/2934583.2934595","DOIUrl":"https://doi.org/10.1145/2934583.2934595","url":null,"abstract":"The Internet of Things (IoT) dramatically increases the amount of data to be processed for many applications including multimedia. Unlike traditional computing environment, the workload of IoT significantly varies overtime. Thus, an efficient runtime profiling is required to extract highly frequent computations and pre-store them for memory-based computing. In this paper, we propose an approximate computing technique using a low-cost adaptive associative memory, named ACAM, which utilizes runtime learning and profiling. To recognize the temporal locality of data in real-world applications, our design exploits a reinforcement learning algorithm with a least recently use (LRU) strategy to select images to be profiled; the profiler is implemented using an approximate concurrent state machine. The profiling results are then stored into ACAM for computation reuse. Since the selected images represent the observed input dataset, we can avoid redundant computations thanks to high hit rates displayed in the associative memory. We evaluate ACAM on the recent AMD Southern Island GPU architecture, and the experimental results shows that the proposed design achieves by 34.7% energy saving for image processing applications with an acceptable quality of service (i.e., PSNR>30dB).","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121069641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TeleProbe: Zero-power Contactless Probing for Implantable Medical Devices","authors":"Woo Suk Lee, Younghyun Kim, V. Raghunathan","doi":"10.1145/2934583.2934593","DOIUrl":"https://doi.org/10.1145/2934583.2934593","url":null,"abstract":"The lack of post-deployment visibility into system behavior is one of the major challenges in ensuring the reliable operation of implantable medical devices (IMDs). While wireless connectivity is becoming common in IMDs for monitoring device status, conventional wireless links incur significant energy overheads for data acquisition, processing, and active radio transmission. While low-power transceivers have been introduced to reduce the energy consumed by the radio itself, the energy consumed by the microcontroller for processing data and controlling the radio has often been overlooked. As a result, in IMDs that have a stringent energy constraint, prolonged signal monitoring over a wireless channel is infeasible due to this prohibitively high power consumption. To address this challenge, we present TELEPROBE, which enables zero-power probing of signals in IMDs. TELEPROBE is a wireless, oscilloscope-like probing mechanism for IMDs that allows direct readout of analog/digital signals in real-time using an LC tank circuit, without any power overhead imposed on the IMD. We have designed and implemented fully functional prototypes of TeleProbe and demonstrated its utility in the context of two practical usage scenarios: in-situ power analysis and off-chip serial communication bus monitoring in IMDs.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129403523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Zhou, Saroj Satapathy, Yingjie Lao, K. Parhi, C. Kim
{"title":"Soft Response Generation and Thresholding Strategies for Linear and Feed-Forward MUX PUFs","authors":"Chen Zhou, Saroj Satapathy, Yingjie Lao, K. Parhi, C. Kim","doi":"10.1145/2934583.2934613","DOIUrl":"https://doi.org/10.1145/2934583.2934613","url":null,"abstract":"In this work, we present probability based response generation schemes for MUX based Physical Unclonable Functions (PUFs). Compared to previous implementations where temporal majority voting (TMV) based on limited samples and coarse criteria was utilized to determine final responses, our design can collect soft responses with detailed probability information using simple on-chip circuits. Thresholds with fine accuracy are applied to efficiently distinguish stable and unstable challenge response pairs (CRPs). A 32nm test chip including both linear and feed-forward MUX PUFs was implemented for concept verification. Based on a detailed analysis of the hardware data, we propose several enhanced thresholding strategies for determining stable CRPs. For instance, a stringent threshold can be imposed in enrollment phase for selecting good CRPs, while a relaxed threshold can be used during normal authentication phase. Experimental data shows a high degree of uniqueness and randomness in the PUF responses which can be attributed to the carefully optimized circuit layout. Finally, output characteristic of a feed-forward MUX PUF was compared to that of a standard linear MUX PUF from the same 32nm chip.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"401 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123094381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 386-μW, 15.2-bit Programmable-Gain Embedded Delta-Sigma ADC for Sensor Applications","authors":"Jaehoon Jun, C. Rhee, Suhwan Kim","doi":"10.1145/2934583.2934636","DOIUrl":"https://doi.org/10.1145/2934583.2934636","url":null,"abstract":"A power-efficient programmable-gain control function embedded Delta-Sigma (ΔΣ) analog-to-digital converter (ADC) for various smart sensor applications is presented. It consists of a programmable-gain switched-capacitor ΔΣ modulator followed by a digital decimation filter for down-sampling. The programmable function is realized with programmable coefficients of a loop filter using a capacitor array. The coefficient control is accomplished with keeping the location of poles of a noise transfer function, so the stability of a designed closed-loop transfer function can be assured. The proposed gain control method helps ADC to optimize its performance with varying input signal magnitude. The gain controllability requires negligible additional energy consuming or area occupying block. The power efficient programmable-gain ADC (PGADC) is well-suited for sensor devices. The gain amplification can be optimized from 0 to 18 dB with a 6 dB step. Measurements show that the PGADC achieves 15.2-bit resolution and 12.4-bit noise free resolution with 99.9 % reliability. The chip operates with a 3.3 V analog supply and a 1.8 V digital supply, while consuming only 97 μA analog current and 37 μA digital current. The analog core area is 0.064 mm2 in a standard 0.18-μm CMOS process.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129022846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction-Guided Performance-Energy Trade-off with Continuous Run-Time Adaptation","authors":"Taejoon Song, Daniel Lo, G. Suh","doi":"10.1145/2934583.2934598","DOIUrl":"https://doi.org/10.1145/2934583.2934598","url":null,"abstract":"Recent work has demonstrated that prediction-guided DVFS control can significantly improve the energy efficiency of interactive applications with little to no impact on user experience when running in isolation. In this work, we propose to add an on-line learning capability to the execution-time predictor, which enables the predictor to automatically adapt to changes in the environment such as interference from other applications and be easily applied across diverse platforms. This paper introduces several techniques to address the overhead of performing on-line learning, including incremental training based on QR decomposition and explicit change detection for fast adaptation. In addition to the DVFS control, we show that the proposed prediction model can be used to intelligently select a core in a heterogeneous system. Experimental results on the ARM big.LITTLE platform show that our DVFS controller and core scheduler can effectively remove deadline misses even under significant interference from competing processes while consuming far lower energy compared to traditional schemes.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130496887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}