{"title":"A temperature-aware synthesis approach for simultaneous delay and leakage optimization","authors":"Nathaniel A. Conos, M. Potkonjak","doi":"10.1109/ICCD.2013.6657059","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657059","url":null,"abstract":"Accurate thermal knowledge is essential for achieving ultra low power in deep sub-micron CMOS technology, as it affects gate speed linearly and leakage exponentially. We propose a temperature-aware synthesis technique that efficiently utilizes input vector control (IVC), dual-threshold voltage gate sizing (GS) and pin reordering (PR) for performing simultaneous delay and leakage power optimization. To the best of our knowledge, we are the first to consider these techniques in a synergistic fashion with thermal knowledge. We evaluate our approach by showing improvements over each method when considered in isolation and in conjunction. We also study the impact of employing considered techniques with/without accurate thermal knowledge. We ran simulations on synthesized ISCAS-85 and ITC-99 circuits on a 45 nm cell library while conforming to an industrial design flow. Leakage power improvements of up to 4.54X (2.14X avg.) were achieved when applying thermal knowledge over equivalent methods that do not.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129238977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Xiao, Chuanjun Zhang, K. Inck, N. Vijaykrishnan
{"title":"Dynamic bandwidth adaptation using recognition accuracy prediction through pre-classification for embedded vision systems","authors":"Yang Xiao, Chuanjun Zhang, K. Inck, N. Vijaykrishnan","doi":"10.1109/ICCD.2013.6657020","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657020","url":null,"abstract":"Empowered by the massive growth of camera enabled mobile devices; mobile applications that allow users to perceive and experience the world in richer and more engaging ways have emerged at tremendous pace. As more complex perception algorithms are developed to take advantage of higher resolution imagery, future mobile applications will require application specific accelerators to maintain performance required for interactive user experiences. A key challenge in these accelerator-rich mobile platforms will be guaranteeing the off-chip memory bandwidth required by each accelerator. Device integration techniques such as Package on Package and Wide-IO seek to tackle the memory wall problem by reducing bottlenecks at the I/O interfaces. However, less effort has been focused on solving the bandwidth problem by dynamically leveraging the individual and collective bandwidth characteristics of accelerators operating concurrently. This work investigates the off-chip bandwidth characteristics of accelerators in the context of embedded perceptual computing applications. A bandwidth aware feedback system is proposed that dynamically partitions available bandwidth among a set of accelerators at the expense of application accuracy. As a case study, the proposed adaption policy is applied to a biologically-inspired scene understanding application. Results indicate that the system maintains good accuracy while requiring only 25% of the original bandwidth.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128081963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Sharma, Joseph Sloan, L. Wanner, Salma Elmalaki, M. Srivastava, Puneet Gupta
{"title":"Towards analyzing and improving robustness of software applications to intermittent and permanent faults in hardware","authors":"A. Sharma, Joseph Sloan, L. Wanner, Salma Elmalaki, M. Srivastava, Puneet Gupta","doi":"10.1109/ICCD.2013.6657076","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657076","url":null,"abstract":"Although a significant fraction of emerging failure and wearout mechanisms result in intermittent or permanent faults in hardware, their impact (as distinct from transient faults) on software applications has not been well studied. In this paper, we develop a distinguishing application characteristic, referred to as similarity from fundamental circuit-level understanding of the failure mechanisms. We present a mathematical definition and a procedure for similarity computation for practical software applications and experimentally verify the relationship between similarity and fault rate. Leveraging dependence of application robustness on the similarity metric, we present example architecture independent code transformations to reduce similarity and thereby the worst-case fault rate with minimal performance degradation. Our experimental results with arithmetic unit faults show as much as 74% improvement in the worst case fault rate on benchmark kernels, with less than 10% runtime penalty.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134069320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"JOP-alarm: Detecting jump-oriented programming-based anomalies in applications","authors":"Fan Yao, Jie Chen, Guru Venkataramani","doi":"10.1109/ICCD.2013.6657084","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657084","url":null,"abstract":"Code Reuse-based Attacks (popularly known as CRA) are becoming increasingly notorious because of their ability to reuse existing code, and evade the guarding mechanisms in place to prevent code injection-based attacks. Among the recent code reuse-based exploits, Jump Oriented Programming (JOP) captures short sequences of existing code ending in indirect jumps or calls (known as gadgets), and utilizes them to cause harmful, unintended program behavior. In this work, we propose a novel, easily implementable algorithm, called JOP-alarm, that computes a score value to assess the potential for JOP attack, and detects possibly harmful program behavior. We demonstrate the effectiveness of our algorithm using published JOP code, and test the false positive alarm rate using several unmodified SPEC2006 benchmarks.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123966919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compiler-based approach to reducing leakage energy of instruction scratch-pad memories","authors":"Y. Huangfu, Wei Zhang","doi":"10.1109/ICCD.2013.6657077","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657077","url":null,"abstract":"In this paper, we study a compiler-based approach to reducing the instruction SPM leakage energy efficiently, which can also minimize the performance overhead. Our evaluation indicates that the compiler-based approach is superior to periodical or bank-based methods. On average, the compiler-based method can reduce the SPM leakage energy by nearly 89.82%, with only 0.25% performance overhead.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123710561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Kadjo, Hyungjun Kim, Paul V. Gratz, Jiang Hu, R. Ayoub
{"title":"Power gating with block migration in chip-multiprocessor last-level caches","authors":"David Kadjo, Hyungjun Kim, Paul V. Gratz, Jiang Hu, R. Ayoub","doi":"10.1109/ICCD.2013.6657030","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657030","url":null,"abstract":"We propose a novel technique to significantly reduce the leakage energy of last level caches while mitigating any significant performance impact. In general, cache blocks are not ordered by their temporal locality within the sets; hence, simply power gating off a partition of the cache, as done in previous studies, may lead to considerable performance degradation. We propose a solution that migrates the high temporal locality blocks to facilitate power gating, where blocks likely to be used in the future are migrated from the partition being shutdown to the live partition at a negligible performance impact and hardware overhead. Our detailed simulations show energy savings of 66% at low performance degradation of 2.16%.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121070426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Variation tolerance and error resilience in a low power wireless receiver","authors":"J. Hoogerbrugge","doi":"10.1109/ICCD.2013.6657063","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657063","url":null,"abstract":"Power consumption of digital baseband processing of a wireless receiver can be reduced by operating the circuits at a reduced voltage where setup timing errors occur occasionally in a controlled way. One of the challenges is then to estimate the BER of the receiver and to create a control loop that controls the voltage such that the estimated BER is within the specifications of the system. The paper describes two mechanisms to realize such a control loop. The first one uses parity-based error detection; the second one is based on the application of forward error correction in the system. Both mechanisms have been modeled in an industrial low power receiver design that includes a model for setup timing error injection. Simulation results show that the control loops are able to accurately control the voltage to the lowest possible level such that the BER stays within the specified constraints.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131615080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thannirmalai Somu Muthukaruppan, Haris Javaid, T. Mitra, S. Parameswaran
{"title":"Energy-aware synthesis of application specific MPSoCs","authors":"Thannirmalai Somu Muthukaruppan, Haris Javaid, T. Mitra, S. Parameswaran","doi":"10.1109/ICCD.2013.6657026","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657026","url":null,"abstract":"In this paper, we propose a framework for synthesis of application specific MultiProcessor System on Chip (MPSoC) for multimedia applications. Our framework searches for a design with minimum energy consumption under area and period constraints. We simultaneously explore selection of voltage-frequency levels, custom instructions, cache configurations, and task mapping. We propose an optimal algorithm based on prune and search operations to efficiently search the complex design space. We also present a heuristic based on map and customize stages to better handle the exponential complexity of the design space, and rapidly find near-optimal solutions. These algorithms are aided by two estimators that can quickly estimate period and energy consumption of a given design point. Experiments reveal that our framework can reduce energy consumption by 37.9% on an average and 57.1% maximum reduction compared to solutions obtained from a combination of existing techniques.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121753760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seongbo Shim, Minyoung Mo, Sangmin Kim, Youngsoo Shin
{"title":"Analysis and minimization of short-circuit current in mesh clock network","authors":"Seongbo Shim, Minyoung Mo, Sangmin Kim, Youngsoo Shin","doi":"10.1109/ICCD.2013.6657082","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657082","url":null,"abstract":"Mesh clock network is very effective at reducing clock skew. But mesh causes a large increase of power consumption, in particular due to shorted buffers. We first analyze the short-circuit power consumption of the mesh clock network. It is observed that skew distribution of premesh tree is important in determining the amount of short-circuit power. We then propose a new clock buffer, which practically eliminates short-circuit current in a mesh network. Experiments on a few test circuits using 40-nm technology indicate that clock power consumption is reduced by 13.0% on average with 4.8% of area increase; this can be compared to buffer sizing, which only achieves 5.6% saving of power.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133504014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Karthik T. Sundararajan, Timothy M. Jones, N. Topham
{"title":"RECAP: Region-Aware Cache Partitioning","authors":"Karthik T. Sundararajan, Timothy M. Jones, N. Topham","doi":"10.1109/ICCD.2013.6657056","DOIUrl":"https://doi.org/10.1109/ICCD.2013.6657056","url":null,"abstract":"In recent years, high performance computing systems have obtained more processing cores and share a last level cache (LLC). However, as their number grows, the core-to-way ratio in the LLC increases, presenting problems to existing cache partitioning techniques which require more ways than cores. Furthermore, effective energy management of the LLC becomes increasingly important due to its size. This paper proposes a Region Aware Cache Partitioning (RECAP), an LLC energy-saving scheme for high-performance, many-core processors. RECAP partitions the data within the cache into shared and private regions. Applications only access the ways containing the data that they require, realising dynamic energy savings. Any ways that are not within the shared or private regions can be turned off to save static energy. We evaluate our scheme using an 8-core CMP running multi-programmed workloads and show that it achieves 17% dynamic and 13% static energy savings in the shared LLC with a 15% performance gain. Across our multi-threaded applications, we achieve 17% dynamic and 41% static energy savings with no impact on performance.","PeriodicalId":398811,"journal":{"name":"2013 IEEE 31st International Conference on Computer Design (ICCD)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134480646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}