{"title":"A Robust and Energy-Efficient Classifier Using Brain-Inspired Hyperdimensional Computing","authors":"Abbas Rahimi, P. Kanerva, J. Rabaey","doi":"10.1145/2934583.2934624","DOIUrl":"https://doi.org/10.1145/2934583.2934624","url":null,"abstract":"The mathematical properties of high-dimensional (HD) spaces show remarkable agreement with behaviors controlled by the brain. Computing with HD vectors, referred to as \"hypervectors,\" is a brain-inspired alternative to computing with numbers. Hypervectors are high-dimensional, holographic, and (pseudo)random with independent and identically distributed (i.i.d.) components. They provide for energy-efficient computing while tolerating hardware variation typical of nanoscale fabrics. We describe a hardware architecture for a hypervector-based classifier and demonstrate it with language identification from letter trigrams. The HD classifier is 96.7% accurate, 1.2% lower than a conventional machine learning method, operating with half the energy. Moreover, the HD classifier is able to tolerate 8.8-fold probability of failure of memory cells while maintaining 94% accuracy. This robust behavior with erroneous memory cells can significantly improve energy efficiency.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123888253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Low Power Design Methodologies","authors":"A. Fahim","doi":"10.1145/3256020","DOIUrl":"https://doi.org/10.1145/3256020","url":null,"abstract":"","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"132 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130876386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Effective and Efficient Quality Management for Approximate Computing","authors":"Ting Wang, Qian Zhang, N. Kim, Q. Xu","doi":"10.1145/2934583.2934608","DOIUrl":"https://doi.org/10.1145/2934583.2934608","url":null,"abstract":"Approximate computing, where computation quality is traded off for better performance and/or energy savings, has gained significant tractions from both academia and industry. With approximate computing, we expect to obtain acceptable results, but how do we make sure the quality of the final results are acceptable? This challenging problem remains largely unexplored. In this paper, we propose an effective and efficient quality management framework to achieve controlled quality-efficiency tradeoffs. To be specific, at the offline stage, our solution automatically selects an appropriate approximator configuration considering rollback recovery for large occasional errors with minimum cost under the target quality requirement. Then during the online execution, our framework judiciously determines when and how to rollback, which is achieved with cost-effective yet accurate quality predictors that synergistically combine the outputs of several basic light-weight predictors. Experimental results demonstrate that our proposed solution can achieve 11% to 23% energy savings compared to existing solutions under the target quality requirement.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130361964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tongda Wu, Yongpan Liu, Hehe Li, C. Xue, H. Lee, Huazhong Yang
{"title":"SATS: An Ultra-Low Power Time Synchronization for Solar Energy Harvesting WSNs","authors":"Tongda Wu, Yongpan Liu, Hehe Li, C. Xue, H. Lee, Huazhong Yang","doi":"10.1145/2934583.2934601","DOIUrl":"https://doi.org/10.1145/2934583.2934601","url":null,"abstract":"Reliable and ultra-low power time synchronization becomes more and more important with the popularity of energy harvesting sensor nodes. This paper proposes an untethered and probabilistic ultra-lower power time synchronization method for energy intermittent sensor network. It avoids the frequent RF communications with the assistance of a solar clock. The SATS system consists of two main parts: the synchronizer, a low power solar clock module for time synchronization, and the S3-Mapping, an offline sequence matching algorithm. Furthermore, we develop an improved version of S3-Mapping, which reduces the computation complexity from exponential to linear using the redundancy models and the onion peeling method. The SATS system is validated by both simulations and a prototype, which shows that the second level synchronization precision can be achieved under reasonable probability. What's more, the energy consumption of time synchronization is reduced by over 1 ~ 2 magnitudes compared with the up-to-date low power time synchronization protocol.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115168314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farshad Ghanei, Pranav Tipnis, Kyle Marcus, Karthik Dantu, Steven Y. Ko, Lukasz Ziarek
{"title":"OS-based Resource Accounting for Asynchronous Resource Use in Mobile Systems","authors":"Farshad Ghanei, Pranav Tipnis, Kyle Marcus, Karthik Dantu, Steven Y. Ko, Lukasz Ziarek","doi":"10.1145/2934583.2934639","DOIUrl":"https://doi.org/10.1145/2934583.2934639","url":null,"abstract":"One essential functionality of a modern operating system is to accurately account for the resource usage of the underlying hardware. This is especially important for computing systems that operate on battery power, since energy management requires accurately attributing resource uses to processes. However, components such as sensors, actuators and specialized network interfaces are often used in an asynchronous fashion, and makes it difficult to conduct accurate resource accounting. For example, a process that makes a request to a sensor may not be running on the processor for the full duration of the resource usage; and current mechanisms of resource accounting fail to provide accurate accounting for such asynchronous uses. This paper proposes a new mechanism to accurately account for the asynchronous usage of resources in mobile systems. Our insight is that by accurately relating the user requests with kernel requests to device and corresponding device responses, we can accurately attribute resource use to the requesting process. Our prototype implemented in Linux demonstrates that we can account for the usage of asynchronous resources such as GPS and WiFi accurately.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"332 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114233958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis and Design of Energy Efficient Time Domain Signal Processing","authors":"Zhengyu Chen, Jie Gu","doi":"10.1145/2934583.2934585","DOIUrl":"https://doi.org/10.1145/2934583.2934585","url":null,"abstract":"Time domain signal processing (TDSP) encodes information into time rather than voltage with higher efficiency than conventional digital design. This paper performs systematical analysis on the design principle and energy efficiency of TDSP. Variation impact, which poses significant challenges to TDSP, is evaluated and a variation driven design methodology is proposed to achieve an optimum tradeoff between energy efficiency and design robustness. Several novel circuit level design techniques such as dual encoding strategy and bit-scalable design are also proposed in this work to significantly improve the energy efficiency of TDSP. Design example on a critical building block of facial recognition application was used to demonstrate the potential of the technique. The result in a 45nm technology shows 3.3X energy-delay product reduction and 34% area saving can be achieved using TDSP compared with conventional digital design technique.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123730222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William J. Song, A. Buyuktosunoglu, Chen-Yong Cher, P. Bose
{"title":"Measurement-Driven Methodology for Evaluating Processor Heterogeneity Options for Power-Performance Efficiency","authors":"William J. Song, A. Buyuktosunoglu, Chen-Yong Cher, P. Bose","doi":"10.1145/2934583.2934637","DOIUrl":"https://doi.org/10.1145/2934583.2934637","url":null,"abstract":"It is generally perceived that heterogeneous multicore processors will provide better performance and power efficiency over conventional homogeneous cores. However, heterogeneity can also be achieved within a homogeneous core design, instantiated under different voltage-frequency settings or per-core simultaneous multi-treading (SMT) modes. In this paper, we pursue an architectural study motivated by the question, \"Can we get by with a single, complex SMT-equipped core design that can operate at different voltage-frequency points? Or, is it mandatory to invest into two different core types, one complex and the other simple?\" We propose a systematic, measurement-driven methodology to evaluate processor heterogeneity options. Our analysis particularly focuses on the domain of real-time constrained embedded processors. The study is based on a direct measurement of two real processors; one that uses simple in-order cores, and another that uses complex out-of-order cores. The effect of heterogeneous core composition (consisting of complex and simple cores in the same chip) is analytically projected from measurements gleaned from the two different systems. Our analysis yields new interesting insights. When dealing with two core types without SMT enabled, true core heterogeneity does not necessarily provide better performance or power efficiency under area and power constraints. If the complex-core homogeneous processor invokes SMT, it outperforms true heterogeneity by offering 28% better power efficiency, assuming that simple cores in the heterogeneous system operate only in single-threaded mode without SMT capability. If the small cores employ SMT, true heterogeneity yields 32% better power efficiency than the homogeneous processor with SMT.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114966076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, J. Cong
{"title":"Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster","authors":"Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, J. Cong","doi":"10.1145/2934583.2934644","DOIUrl":"https://doi.org/10.1145/2934583.2934644","url":null,"abstract":"Recently, FPGA-based CNN accelerators have demonstrated superior energy efficiency compared to high-performance devices like GPGPUs. However, due to the constrained on-chip resource and many other factors, single-board FPGA designs may have difficulties in achieving optimal energy efficiency. In this paper we present a deeply pipelined multi-FPGA architecture that expands the design space for optimal performance and energy efficiency. A dynamic programming algorithm is proposed to map the CNN computing layers efficiently to different FPGA boards. To demonstrate the potential of the architecture, we built a prototype system with seven FPGA boards connected with high-speed serial links. The experimental results on AlexNet and VGG-16 show that the prototype can achieve up to 21x and 2x energy efficiency compared to optimized multi-core CPU and GPU implementations, respectively.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134069576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Thermal-Aware Physical Space Allocation Strategy for 3D Flash Memory Storage Systems","authors":"Yi Wang, Mingxu Zhang, Lisha Dong, Xuan Yang","doi":"10.1145/2934583.2934638","DOIUrl":"https://doi.org/10.1145/2934583.2934638","url":null,"abstract":"Three-dimensional (3D) flash memory stacks layers of data storage cells vertically to overcome the scaling limits in conventional planar NAND flash memory. Current 3D flash memory faces new challenges including thermal issues and complex manufacturing process. This paper presents TheraPhy, a novel thermal-aware physical space allocation strategy for three-dimensional flash memory storage systems. TheraPhy permutes the allocation of physical blocks. Consecutively accessed logical blocks are distributed to different physical locations in order to prevent the accumulation of hotspots. TheraPhy requires no changes to the file system, on-chip memory hierarchy, or hardware implementation of 3D flash memory. Based on TheraPhy, we present an address mapping strategy that is capable of determining the allocation of physical blocks based on their thermal status. We demonstrate the viability of the proposed technique using a set of extensive experiments. Experimental results show that TheraPhy can reduce the peak temperature by 15.39% with less than 1% extra erase overhead in comparison with the baseline scheme.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133576331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Samal, D. Nayak, M. Ichihashi, S. Banna, S. Lim
{"title":"How to Cope with Slow Transistors in the Top-tier of Monolithic 3D ICs: Design Studies and CAD Solutions","authors":"S. Samal, D. Nayak, M. Ichihashi, S. Banna, S. Lim","doi":"10.1145/2934583.2934643","DOIUrl":"https://doi.org/10.1145/2934583.2934643","url":null,"abstract":"In this paper we study the impact of low thermal budget process on design quality in monolithic 3D ICs (M3D). Specifically, we quantify how much the tier-to-tier transistor performance difference affects full-chip power and performance metrics in a foundry 14nm FinFET technology. Our study first shows that 5%, 10%, and 15% top-tier device degradation in a wire-dominated, timing-closed monolithic 3D IC design leads to 7%, 12%, and 18% full-chip timing violation, respectively. Next, we address this impact with our CAD solution named Tier-Aware M3D (TA-M3D) flow that identifies potential timing-critical paths and partitions them into the faster (bottom) tier to minimize the top-tier degradation impact. One unique challenge in timing closure in this case, is how to conduct buffering and sizing on the paths that lie entirely in the top or bottom-tier as well as those that span both tiers. Our approach handles all 3 types of paths carefully and closes timing under the given top-tier degradation assumption, while minimizing the total power consumption. Our enhanced monolithic 3D IC designs, even with 5%, 10%, and 15% slower transistors in the top-tier, still offers 26%, 24%, and 5% power savings over 2D IC, respectively. Our study also covers other types of circuits.","PeriodicalId":142716,"journal":{"name":"Proceedings of the 2016 International Symposium on Low Power Electronics and Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132627873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}