Proceedings of the 2016 International Symposium on Low Power Electronics and Design最新文献

筛选
英文 中文
Power--Aware Performance Adaptation of Concurrent Applications in Heterogeneous Many-Core Systems 异构多核系统中并发应用的功率感知性能适应
Ali Aalsaud, R. Shafik, A. Rafiev, Fei Xia, Sheng Yang, A. Yakovlev
{"title":"Power--Aware Performance Adaptation of Concurrent Applications in Heterogeneous Many-Core Systems","authors":"Ali Aalsaud, R. Shafik, A. Rafiev, Fei Xia, Sheng Yang, A. Yakovlev","doi":"10.1145/2934583.2934612","DOIUrl":"https://doi.org/10.1145/2934583.2934612","url":null,"abstract":"Modern embedded systems execute multiple applications, both sequentially and concurrently. These applications are exercised on heterogeneous platforms generating varying power consumption and system workloads (CPU or memory intensive or both). As a result, determining the most energy-efficient system configuration (i.e. the number of parallel threads, their core allocations and operating frequencies) tailored for each kind of workload and application scenario is extremely challenging. In this paper, we propose a novel runtime optimization approach with the aim of achieving maximized power normalized performance considering dynamic variation of workload and application scenarios. Fundamental to this approach is a comprehensive study to investigate the tradeoffs between inter-application concurrency with performance and power consumption under different system configurations. Using real experimental measurements on an Odroid XU-3 heterogeneous platform with a number of PARSEC benchmark applications, we model power normalized performance (in terms of IPS/Watt) underpinning analytical power and performance models, derived through multivariate linear regression (MLR). Using these models, we show that with increasing number of concurrent CPU intensive applications show variable gains in IPS/Watt compared to the memory intensive applications in both sequential and concurrent application scenarios. Furthermore, we demonstrate that it is possible to continuously adapt system configuration through a low-cost and linear-complexity runtime algorithm, which can improve the IPS/Watt by up to 125% compared to the existing approach.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128145709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Normally-OFF STT-MRAM Cache with Zero-Byte Compression for Energy Efficient Last-Level Caches 通常关闭STT-MRAM缓存与零字节压缩节能的最后一级缓存
Fabian Oboril, F. Hameed, R. Bishnoi, A. Ahari, Helia Naeimi, M. Tahoori
{"title":"Normally-OFF STT-MRAM Cache with Zero-Byte Compression for Energy Efficient Last-Level Caches","authors":"Fabian Oboril, F. Hameed, R. Bishnoi, A. Ahari, Helia Naeimi, M. Tahoori","doi":"10.1145/2934583.2934629","DOIUrl":"https://doi.org/10.1145/2934583.2934629","url":null,"abstract":"Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) is a promising alternative to SRAM due to its low leakage and scalability advantages. In fact, although being more energy-efficient than SRAM, STT-MRAM caches at higher levels (e.g. L3) still incur a high energy consumption due to 1) high leakage in their read and write circuits and 2) high dynamic write energy in their bit-cells. To address this problem, we propose a novel normally-off STT-MRAM cache that exploits the fact that most applications access zero-byte patterns very frequently. In this architecture, writing of zero-bytes is avoided to reduce write energy. In addition, all read and write circuits are by default power gated (i.e. normally-off) to reduce leakage power. Then, dynamically at runtime, only those circuits required for the ongoing operation are activated. Our evaluations for an L3-cache of a multi-core microprocessor show that this approach reduces the energy consumption by 60% compared to state-of-the-art, while its impact on performance is negligible.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127860874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Ferroelectric Transistor based Non-Volatile Flip-Flop 基于铁电晶体管的非易失性触发器
Danni Wang, Sumitha George, Ahmedullah Aziz, S. Datta, N. Vijaykrishnan, S. Gupta
{"title":"Ferroelectric Transistor based Non-Volatile Flip-Flop","authors":"Danni Wang, Sumitha George, Ahmedullah Aziz, S. Datta, N. Vijaykrishnan, S. Gupta","doi":"10.1145/2934583.2934603","DOIUrl":"https://doi.org/10.1145/2934583.2934603","url":null,"abstract":"We present a non-volatile flip-flop with a feature to back-up the state in a ferroelectric transistor (FEFET) during power failure or supply gating. The data is stored in the form of polarization of the ferroelectric (FE) layer in the gate stack of the FEFET. The proposed flip-flop utilizes the non-volatility of the three-terminal FEFET to optimize the data backup and restore operations. We perform an extensive device-circuit analysis to provide insights into the design of the proposed flip-flop. We discuss the optimization of the FE thickness in the gate stack of the FEFET to introduce suitable non-volatility and present the implications at the circuit level. Our analysis shows that by virtue of the three terminal structure of the FEFET and the order of magnitude difference in the current for the two polarization states, the design of the backup/restore module is considerably simplified. Compared to a FE capacitor based non-volatile flip-flop, the proposed flip-flop achieves 40%--50% smaller backup delay, 27%--40% lower backup energy, comparable restore delay and up to an order of magnitude lower restore energy. While the FE capacitor based design leads to 76% area penalty compared to a conventional (volatile) flip-flop, the proposed design incurs only 35% area overhead.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"1932 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128017667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Exploiting Fully Integrated Inductive Voltage Regulators to Improve Side Channel Resistance of Encryption Engines 利用全集成电感式稳压器提高加密引擎侧信道电阻
Monodeep Kar, Arvind Singh, S. Mathew, Anand Rajan, V. De, S. Mukhopadhyay
{"title":"Exploiting Fully Integrated Inductive Voltage Regulators to Improve Side Channel Resistance of Encryption Engines","authors":"Monodeep Kar, Arvind Singh, S. Mathew, Anand Rajan, V. De, S. Mukhopadhyay","doi":"10.1145/2934583.2934607","DOIUrl":"https://doi.org/10.1145/2934583.2934607","url":null,"abstract":"This paper explores fully integrated inductive voltage regulators (FIVR) as a technique to improve the side channel resistance of encryption engines. We propose security aware design modes for low passive FIVR to improve robustness of an encryption-engine against statistical power attacks in time and frequency domain. A Correlation Power Analysis is used to attack a 128-bit AES engine synthesized in 130nm CMOS. The original design requires ~250 Measurements to Disclose (MTD) the 1st byte of key; but with security-aware FIVR, the CPA was unsuccessful even after 20,000 traces. We present a reversibility based threat model for the FIVR-based protection improvement and show the robustness of security aware FIVR against such threat.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132821615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
DynSleep: Fine-grained Power Management for a Latency-Critical Data Center Application DynSleep:针对延迟关键型数据中心应用程序的细粒度电源管理
C. Chou, Daniel Wong, L. Bhuyan
{"title":"DynSleep: Fine-grained Power Management for a Latency-Critical Data Center Application","authors":"C. Chou, Daniel Wong, L. Bhuyan","doi":"10.1145/2934583.2934616","DOIUrl":"https://doi.org/10.1145/2934583.2934616","url":null,"abstract":"Servers running in datacenters are commonly kept underutilized to meet stringent latency targets. Due to poor energy-proportionality in commodity servers, the low utilization results in wasteful power consumption that cost millions of dollars. Applying dynamic power management on datacenter workloads is challenging, especially when tail latency requirements often fall in the sub-millisecond level. The fundamental issue is randomness due to unpredictable request arrival times and request service times. Prior techniques applied per-core DVFS to have fine-grain control of slowing down request processing without violating the tail latency target. However, most commodity servers only support per-core DFS, which greatly limits potential energy saving. In this paper, we propose DynSleep, a fine-grain power management scheme for datacenter workloads through the use of per-core sleep states (C-states). DynSleep dynamically postpones the processing of some requests, creating longer idle periods, which allow the use of deeper C-states to save energy. We design and implement DynSleep with Mem-cached, a popular key-value store application used in datacenters. The experimental results show that DynSleep achieves up to 65% core power saving, and 27% better than the per-core DVFS power management scheme, while still satisfying the tail latency constraint. To the best of our knowledge, this is the first work to analyze and develop power management technique with CPU C-states in latency-critical datacenter workloads","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131391357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Energy-Efficient Adaptive Classifier Design for Mobile Systems 移动系统节能自适应分类器设计
Zafar Takhirov, Joseph Wang, Venkatesh Saligrama, A. Joshi
{"title":"Energy-Efficient Adaptive Classifier Design for Mobile Systems","authors":"Zafar Takhirov, Joseph Wang, Venkatesh Saligrama, A. Joshi","doi":"10.1145/2934583.2934615","DOIUrl":"https://doi.org/10.1145/2934583.2934615","url":null,"abstract":"With the continuous increase in the amount of data that needs to be processed by digital mobile systems, energy-efficient computation has become a critical design constraint for mobile systems. In this paper, we propose an adaptive classifier that leverages the wide variability in data complexity to enable energy-efficient data classification operations for mobile systems. Our approach takes advantage of varying classification \"hardness\" across data to dynamically allocate resources and improve energy efficiency. On average, our adaptive classifier is ≈ 100× more energy efficient but has ≈ 1% higher error rate than a complex radial basis function classifier and is ≈ 10× less energy efficient but has ≈ 40% lower error rate than a simple linear classifier across a wide range of classification data sets.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131517726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Session details: Accelerators for Machine Learning
Rangharajan Venkatesan, Wei Wu
{"title":"Session details: Accelerators for Machine Learning","authors":"Rangharajan Venkatesan, Wei Wu","doi":"10.1145/3256013","DOIUrl":"https://doi.org/10.1145/3256013","url":null,"abstract":"","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125074066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physical Design Solutions to Tackle FEOL/BEOL Degradation in Gate-level Monolithic 3D ICs 解决门级单片3D ic中FEOL/BEOL退化的物理设计解决方案
B. W. Ku, P. Debacker, D. Milojevic, P. Raghavan, D. Verkest, A. Thean, S. Lim
{"title":"Physical Design Solutions to Tackle FEOL/BEOL Degradation in Gate-level Monolithic 3D ICs","authors":"B. W. Ku, P. Debacker, D. Milojevic, P. Raghavan, D. Verkest, A. Thean, S. Lim","doi":"10.1145/2934583.2934622","DOIUrl":"https://doi.org/10.1145/2934583.2934622","url":null,"abstract":"In this paper, we develop physical design tools and methodologies to tackle the inter-tier performance variations caused by low temperature manufacturing in 2-tier gate-level monolithic 3D ICs (M3D). First, we model the top tier front-end-of-line (FEOL) device mobility degradation and its impact on cell delay/power values. Next, we quantify the impact of tungsten interconnect and cost-driven metal layer saving in the back-end-of-line (BEOL) of the bottom tier. These device and interconnect degradation models are used in our new full-chip M3D physical design flow named Derated 2D. This flow overcomes the well-known drawback of the state-of-the-art Shrunk 2D that requires shrinking of layout objects and RC parasitics. Also, Derated 2D performs low-temperature process-aware tier partitioning to effectively keep timing-critical components in the bottom tier. Moreover, Derated 2D conducts timing-driven monolithic inter-tier via (MIV) planning to cope with the resistivity increase in tungsten BEOL. Lastly, Derated 2D offers an effective timing closure solution through a post-route optimization. Experiments based on a foundry-grade 7nm FinFET process design kit (PDK) show that Derated 2D achieves up to 36% performance improvement and 10% energy saving compared with Shrunk 2D. Using a post-route optimization, Derated 2D further improves timing under various FEOL/BEOL degradation settings at a minimum energy overhead.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121187877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
An Efficient Parallel Scheduling Scheme on Multi-partition PCM Architecture 一种基于多分区PCM结构的高效并行调度方案
Wen Zhou, D. Feng, Yu Hua, Jingning Liu, Fangting Huang, Yu Chen
{"title":"An Efficient Parallel Scheduling Scheme on Multi-partition PCM Architecture","authors":"Wen Zhou, D. Feng, Yu Hua, Jingning Liu, Fangting Huang, Yu Chen","doi":"10.1145/2934583.2934610","DOIUrl":"https://doi.org/10.1145/2934583.2934610","url":null,"abstract":"Phase Change Memory (PCM) is an emerging non-volatile memory with the salient features of large-scale, high-speed, low-power and radiation resistance. It hence becomes an ideal candidate for the next-generation storage media of main memory. However, PCM suffers from inefficient I/O performance due to long write latency. Recent studies propose a multi-partition (or multi-subarray) architecture within each bank to enhance internal parallelism. However, conventional scheduling schemes fail to exploit the advantage of multiple partitions and incur inefficient bank utilization. In this paper, we propose a Write Priority overlap Read (WPoR) scheduling scheme which preferentially serves for a write request in one partition and allows other partitions to perform as many read requests as possible within this partition's program duration. Experimental results demonstrate that WPoR reduces the write latency by 24.7% (on average) compared with state-of-the-art scheduling algorithms. Meanwhile, the IPC indicator of WPoR scheduling increases respectively 6%, 7% and 26% (on average) compared with Read Priority, Write Pausing and Write Cancellation schemes.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126975530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Unified Power Frequency Model Framework 统一工频模型框架
Sriram Sundaram, Warren He, Sriram Sambamurthy, Aaron Grenat, Steven Liepe, S. Naffziger
{"title":"Unified Power Frequency Model Framework","authors":"Sriram Sundaram, Warren He, Sriram Sambamurthy, Aaron Grenat, Steven Liepe, S. Naffziger","doi":"10.1145/2934583.2934605","DOIUrl":"https://doi.org/10.1145/2934583.2934605","url":null,"abstract":"This paper describes a unified power-frequency model (UPFM) which combines analytical and empirical approaches to ensure a high degree of modeling flexibility and accuracy to measured silicon (Si) results. On one end, System-on-a-Chip (SoC) design teams focus the bulk of their efforts on using detailed low-level models to verify power consumption. Such models are available late in the design cycle, and often limited in number of workloads that can be evaluated. On the other end, FPGA-based modeling and spreadsheet approaches that operate on higher-level abstraction have been proposed. However these are often limited by poor correlation to measured Si results. In addition, extant models typically focus on power projection or prediction of Si speed but not both. A unified approach is much needed since SOCs today have to meet stringent power and performance constraints simultaneously. The proposed UPFM model overcomes these limitations. First actual measured Si results serve as the empirical baseline foundation for projections so that simulated vs. measured differences can be calibrated. Second, each IP is analytically modeled using a large number of relevant parameters. This high level of abstraction allows for the model to be useful from early design cycle all the way to the mature phase (parameters get refined over time). Also, wide-ranging parameters have been carefully chosen (and improved over multiple product generations) so that accuracy is not sacrificed. We demonstrate UPFM as a comprehensive framework where technology, architecture and infrastructure (test/thermal) choices can be modeled with high accuracy and drive optimal perf-per-watt SoC designs.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115756515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信