{"title":"Clustering-based revision debug in regression verification","authors":"Djordje Maksimovic, A. Veneris, Zissis Poulos","doi":"10.1109/ICCD.2015.7357081","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357081","url":null,"abstract":"Modern digital systems are growing in size and complexity, introducing significant organizational and verification challenges in the design cycle. Verification today takes as much as 70% of the design time with debugging being responsible for half of this effort. Automation has mitigated part of the resource-intensive nature of rectifying erroneous designs. Nevertheless, most tools target failures in isolation. Since regression verification can discover myriads of failures in one run, automation is also required to guide an engineer to rank them and expedite debugging. To address this growing regression pain, this paper presents a framework that utilizes traditional machine learning techniques along with historical data in version control systems and the results of functional debugging. Its aim is to rank revisions based on their likelihood of being responsible for a particular failure. Ranking prioritizes revisions that ought to be targeted first, and therefore it speeds-up the localization of the error source. This effectively reduces the number of debug iterations. Experiments on industrial designs demonstrate a 68% improvement in the ranking of actual erroneous revisions versus the ranking obtained through existing industrial methodologies. This benefit arrives with negligible run-time overhead.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130390393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chao Luo, Yunsi Fei, Pei Luo, Saoni Mukherjee, D. Kaeli
{"title":"Side-channel power analysis of a GPU AES implementation","authors":"Chao Luo, Yunsi Fei, Pei Luo, Saoni Mukherjee, D. Kaeli","doi":"10.1109/ICCD.2015.7357115","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357115","url":null,"abstract":"Graphics Processing Units (GPUs) have been used to run a range of cryptographic algorithms. The main reason to choose a GPU is to accelerate the encryption/decryption speed. Since GPUs are mainly used for graphics rendering, and only recently have they become a fully-programmable parallel computing device, there has been little attention paid to their vulnerability to side-channel attacks. In this paper we present a study of side-channel vulnerability on a state-of-the-art graphics processor. To the best of our knowledge, this is the first work that attempts to extract the secret key of a block cipher implemented to run on a GPU. We present a side-channel power analysis methodology to extract all of the last round key bytes of a CUDA AES (Advanced Encryption Standard) implementation run on an NVIDIA TESLA GPU. We describe how we capture power traces and evaluate the power consumption of a GPU. We then construct an appropriate power model for the GPU. We propose effective methods to sample and process the GPU power traces so that we can recover the secret key of AES. Our results show that parallel computing hardware systems such as a GPU are highly vulnerable targets to power-based side-channel attacks, and need to be hardened against side-channel threats.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130824754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A thermal adaptive scheme for reliable write operation on RRAM based architectures","authors":"F. García-Redondo, M. López-Vallejo, P. Ituero","doi":"10.1109/ICCD.2015.7357126","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357126","url":null,"abstract":"Resistive RAMs (RRAMs) are one of the most promising alternatives to future storage and neuromorphic computing systems. However, the behavior of RRAM highly depends on voltage, crossbar design and operation temperature. Actually, the circuit temperature becomes one of the most critical issues in fast memories during writing operations. In this paper we propose a novel thermal-adaptive RRAM writing scheme, applicable to crossbar memories, whose smart operation is able to mitigate the writing errors induced by temperature variations. Using a sensing-acting scheme our system is able to improve the memory reliability without affecting the writing/reading performance. Moreover, the proposed architecture is compatible with most proposed write/read designs making achievable multibit storage, which requires extremely accurate operations.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"953 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123702838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manjunath Shevgoor, Naveen Muralimanohar, R. Balasubramonian, Yoocharn Jeon
{"title":"Improving memristor memory with sneak current sharing","authors":"Manjunath Shevgoor, Naveen Muralimanohar, R. Balasubramonian, Yoocharn Jeon","doi":"10.1109/ICCD.2015.7357164","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357164","url":null,"abstract":"Several memory vendors are pursuing different kinds of memory cells that can offer high density, non-volatility, high performance, and high endurance. There are several on-going efforts to architect main memory systems with these new NVMs that can compete with traditional DRAM systems. Each NVM has peculiarities that require new microarchitectures and protocols for memory access. In this work, we focus on memristor technology and the sneak currents inherent in memristor crossbar arrays. A read in state-of-the-art designs requires two consecutive reads; the first measures background sneak currents that can be factored out of the current measurement in the second read. This paper introduces a mechanism to reuse the background sneak current measurement for subsequent reads from the same column, thus introducing \"open-column\" semantics for memristor array access. We also examine a number of data mapping policies that allow the system to balance parallelism and locality. We conclude that on average, it is better to prioritize locality; our best design yields a 20% improvement in read latency and a 26% memory power reduction, relative to the state-of-the-art memristor baseline.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116723324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Demme, B. Rajendran, S. Nowick, S. Sethumadhavan
{"title":"Increasing reconfigurability with memristive interconnects","authors":"J. Demme, B. Rajendran, S. Nowick, S. Sethumadhavan","doi":"10.1109/ICCD.2015.7357124","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357124","url":null,"abstract":"The design of on-chip interconnects is largely governed by the size and power of the devices being connected. While large components like memory controllers, video decode accelerators, and cores can afford the overhead of a large packet switching NoC router, smaller components like adders or other ALUs cannot. Instead, they are typically connected via simple wires, limiting their runtime reconfigurability. The notable exception - FPGAs - use an interconnect which allows extreme reconfigurability, but the FPGA pays for it in area, power, and latency costs. Less costly reconfigurable interconnects, therefore, could allow hardware designers to expose more reconfigurability while limiting area and power costs. This paper presents the design of a high-radix circuit switching crossbar design using memristors. This design utilizes Phase Change Memory (PCM), overcoming some of its limitations such as leakage power and low voltage operation. The very small size of memristors shrinks the area, power, and latency of crossbars by up to 16x, 4.4x, and 2.4x, respectively, leaving little interconnect overhead but wiring overhead. As a tool for designers, memristive interconnects offer significant potential to increase runtime design flexibility.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115693865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Jongerius, Giovanni Mariani, Andreea Anghel, G. Dittmann, E. Vermij, H. Corporaal
{"title":"Analytic processor model for fast design-space exploration","authors":"R. Jongerius, Giovanni Mariani, Andreea Anghel, G. Dittmann, E. Vermij, H. Corporaal","doi":"10.1109/ICCD.2015.7357136","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357136","url":null,"abstract":"In this paper, we propose an analytic model that takes as inputs a) a parametric microarchitecture-independent characterization of the target workload, and b) a hardware configuration of the core and the memory hierarchy, and returns as output an estimation of processor-core performance. To validate our technique, we compare our performance estimates with measurements on an Intel® Xeon® system. The average error increases from 21% for a state-of-the-art simulator to 25% for our model, but we achieve a speedup of several orders of magnitude. Thus, the model enables fast designspace exploration and represents a first step towards an analytic exascale system model.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115186208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic core scaling: Trading off performance and energy beyond DVFS","authors":"Wei Zhang, Hang Zhang, J. Lach","doi":"10.1109/ICCD.2015.7357120","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357120","url":null,"abstract":"Dynamic voltage and frequency scaling (DVFS) is commonly employed on modern superscalar processors to reduce energy when peak performance is not needed or allowed. As technology scales, the effectiveness of DVFS is limited by the shrinking viable supply voltage range. This work proposes dynamic core scaling (DCS) to extend performance-energy tradeoff capabilities in superscalar processors. DCS ensures that programs run at a given percentage of their maximum speed and, at the same time, minimizes energy consumption by dynamically adjusting the active superscalar datapath resources. Evaluations using an 8-way superscalar processor implemented on 45nm circuit infrastructure show that DCS is more effective in performance-energy tradeoffs than DVFS at the high performance end. When used together with DVFS, DCS saves an additional 20% of a full-size core's energy on average. At the minimum operating voltage, DVFS stops reducing energy, while DCS is still able to achieve an average of 46% further energy reduction.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128672242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DLB: Dynamic lane borrowing for improving bandwidth and performance in Hybrid Memory Cube","authors":"Xianwei Zhang, Youtao Zhang, Jun Yang","doi":"10.1109/ICCD.2015.7357093","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357093","url":null,"abstract":"The Hybrid Memory Cube (HMC) is an innovative DRAM architecture that adopts 3D-stacking to improve bandwidth and save energy. An HMC module adopts separate receive and transmit lanes and thus may achieve the maximal memory bandwidth only if data can be driven at full speed in both directions. However, due to the natural read and write imbalance in modern applications, the effective memory bandwidth utilization is often low, leading to suboptimal system performance. In this paper, we propose DLB (dynamic lane borrowing) that dynamically tracks link utilization and partitions the lanes in one link between receive and transmit directions. DLB allocates more lanes to transmit if servicing read-intensive applications. With more lanes allocated to either direction, DLB reduces the lane contention along that direction and thus the average memory access latency. Our experimental results show that DLB improves the bandwidth utilization by 10.4% on average, reduces the average utilization gap in two directions from 35.6% to 12.8%, and saves execution time by as much as 22.3%.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130618395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Methods for analysing and improving the fault resilience of delay-insensitive codes","authors":"J. Lechner, A. Steininger, F. Huemer","doi":"10.1109/ICCD.2015.7357160","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357160","url":null,"abstract":"Delay-insensitive (DI) codes are usually prone to transient faults occurring during an ongoing transmission. For most DI codewords even a single transient can turn an incomplete transmission into a complete codeword, which is different from the originally sent codeword. Unless further redundant information is provided, the receiver has no means to detect such a transmission fault. In this paper we therefore propose two methods to systematically increase redundancy, either by i) building resilient subcodes, or by ii) using a two-step data encoding where error detecting codes are appropriately combined with delay-insensitive codes. In contrast to existing approaches we carefully avoid the introduction of timing assumptions to mask faults. Both methods are generic and can be used for any 4-phase DI code. In this paper we apply them to m-of-n codes, Berger and Zero-Sum codes and thoroughly analyse the efficiency of the resulting coding schemes.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124325732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wang Zhou, Dhruv Nair, O. Gunawan, T. V. Kessel, H. Hamann
{"title":"A testing platform for on-drone computation","authors":"Wang Zhou, Dhruv Nair, O. Gunawan, T. V. Kessel, H. Hamann","doi":"10.1109/ICCD.2015.7357188","DOIUrl":"https://doi.org/10.1109/ICCD.2015.7357188","url":null,"abstract":"This paper describes the development of a test bed for an on-drone computation system, in which the drone plays the game of ping-pong competitively (YCCD: The Yorktown Cognitive Competition Drone). Unlike other drone systems and demonstrators YCCD will be completely autonomous with no external support from cameras, servers, GPS etc. YCCD will have ultra-low power computation capabilities including on-drone real-time processing for vision and localization (non-GPS based). Architectural design and processing algorithms of the system are discussed in detail.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125042741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}