{"title":"Brook Auto: High-Level Certification-Friendly Programming for GPU-powered Automotive Systems","authors":"Matina Maria Trompouki, Leonidas Kosmidis","doi":"10.1145/3195970.3196002","DOIUrl":"https://doi.org/10.1145/3195970.3196002","url":null,"abstract":"Modern automotive systems require increased performance to implement Advanced Driving Assistance Systems (ADAS). GPU-powered platforms are promising candidates for such computational tasks, however current low-level programming models challenge the accelerator software certification process, while they limit the hardware selection to a fraction of the available platforms. In this paper we present Brook Auto, a high-level programming language for automotive GPU systems which removes these limitations. We describe the challenges and solutions we faced in its implementation, as well as a complete evaluation in terms of performance and productivity, which shows the effectiveness of our method.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"17 3 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76990721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Fast and Robust Failure Analysis of Memory Circuits Using Adaptive Importance Sampling Method","authors":"Xiao Shi, Jun Yang, Fengyuan Liu, Lei He","doi":"10.1145/3195970.3195972","DOIUrl":"https://doi.org/10.1145/3195970.3195972","url":null,"abstract":"Performance failure has become a growing concern for the robustness and reliability of memory circuits. It is challenging to accurately estimate the extremely small failure probability when failed samples are distributed in multiple disjoint failure regions. In this paper, we develop an adaptive importance sampling (AlS) method. AIS has several iterations of sampling region adjusbnents, while existing methods pre-decide a static sampling distribution. By iteratively searching for failure regions, AIS may lead to better efficiency and accuracy. This is validated by our experiments. For SRAM cell with single failure region, AIS uses 5–10X fewer samples and reaches better accuracy when compared to several recent methods. For sense amplifier circuit with multiple failure regions, AIS is 4369X faster than MC without compromising accuracy, while other methods fail to cover all failure regions in our experiment","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"97 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76188637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Locality Aware Memory Assignment and Tiling","authors":"Samuel Rogers, H. Tabkhi","doi":"10.1145/3195970.3196070","DOIUrl":"https://doi.org/10.1145/3195970.3196070","url":null,"abstract":"With the trend toward specialization, an efficient memory-path design is vital to capitalize customization in data-path. A monolithic memory hierarchy is often highly inefficient for irregular applications, traditionally targeted for CPUs. New approaches and tools are required to offer application-specific memory customization combining the benefits of cache and scratchpad memory simultaneously.This paper introduces a novel approach for automated application-specific on-chip memory assignment and tiling. The approach offers two major tools: (1) static memory access analysis and (2) variable-level memory assignment. Static memory analysis performs at the LLVM abstraction. It extracts target-independent pointer behaviors, measures the access strides and analyze the prefetchability of variables. (2) variable-level memory assignment creates a memory allocation graph for memory assignment (cache vs. scratchpad) based on the variables size and their estimated locality. It also explores the opportunity for tiling memory access. For the exploration and results, this paper uses Machsuite benchmarks (with both regular & irregular memory access behaviors), and gem5-Aladdin tool for performance & power evaluation. The proposed approach optimizes the memory hierarchy by automatically combining the benefits of cache, (tiled-) scratchpad at variable level granularity per individual applications. The results demonstrate more than 45% improvement in our power-stall product, on average, over the monolithic cache or scratchpad design.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88374071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reducing the Overhead of Authenticated Memory Encryption Using Delta Encoding and ECC Memory","authors":"Salessawi Ferede Yitbarek, T. Austin","doi":"10.1145/3195970.3196102","DOIUrl":"https://doi.org/10.1145/3195970.3196102","url":null,"abstract":"Data stored in an off-chip memory, such as DRAM or non-volatile main memory, can potentially be extracted or tampered by an attacker with physical access to a device. Protecting such attacks requires storing message authentication codes and counters - which incur a 22% storage overhead. In this work, we propose techniques for reducing these overheads.We first present a scheme that leverages ECC DRAMs to reduce MAC verification & storage overheads. We replace the parity bits in standard ECC by a combination of MAC and parity bits to provide both authentication and error correction. This eliminates the extra MAC storage and minimizes the verification overhead as MACs can be read in parallel with data through the ECC bus. Next, we use efficient integer encodings to reduce counter storage overhead by 6 × while enhancing application performance.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"74 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80564750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunfei Gu, Dengxue Yan, Vaibhav Verma, M. Stan, Xuan Zhang
{"title":"SRAM based Opportunistic Energy Efficiency Improvement in Dual-Supply Near-Threshold Processors","authors":"Yunfei Gu, Dengxue Yan, Vaibhav Verma, M. Stan, Xuan Zhang","doi":"10.1145/3195970.3196121","DOIUrl":"https://doi.org/10.1145/3195970.3196121","url":null,"abstract":"Energy-efficient microprocessors are essential for a wide range of applications. While near-threshold computing is a promising technique to improve energy efficiency, optimal supply demands from logic core and on-chip memory are confiicting. In this paper, we perform reliability analysis of 6T SRAM and discover imbalanced minimum voltage requirements between read and write operations. We leverage this imbalance property in near-threshold processors equipped with voltage boosting capability by proposing an opportunistic dual-supply switching scheme with a write aggregation buffer. Our results show that proposed technique improves energy efficiency by more than 18% with approximate 8.54% performance speed-up.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"26 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83689313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Achieving Defect-Free Multilevel 3D Flash Memories with One-Shot Program Design","authors":"Chien-Chung Ho, Yung-Chun Li, Yuan-Hao Chang, Yu-Ming Chang","doi":"10.1145/3195970.3195982","DOIUrl":"https://doi.org/10.1145/3195970.3195982","url":null,"abstract":"To store the desired data on MLC and TLC flash memories, the conventional programming strategies need to divide a fixed range of threshold voltage (Vt) window into several parts. The narrowly partitioned Vt window in turn limits the design of programming strategy and becomes the main reason to cause flash-memory defects, i.e., the longer read/write latency and worse data reliability. This motivates this work to explore the innovative programming design for solving the flash-memory defects. Thus, to achieve the defect-free 3D NAND flash memory, this paper presents and realizes a one-shot program design to significantly eliminate the negative impacts caused by conventional programming strategies. The proposed one-shot program design includes two strategies, i.e., prophetic and classification programming, for MLC flash memories, and the idea is extended to TLC flash memories. The measurement results show that it can accelerate programming speed by 31x and reduce RBER by 1000x for the MLC flash memory, and it can broaden the available window of threshold voltage up to 5.1x for the TLC flash memory.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83551396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Dao, Nian-Ze Lee, Li-Cheng Chen, Mark Po-Hung Lin, J. H. Jiang, A. Mishchenko, R. Brayton
{"title":"Efficient Computation of ECO Patch Functions","authors":"A. Dao, Nian-Ze Lee, Li-Cheng Chen, Mark Po-Hung Lin, J. H. Jiang, A. Mishchenko, R. Brayton","doi":"10.1145/3195970.3196039","DOIUrl":"https://doi.org/10.1145/3195970.3196039","url":null,"abstract":"Engineering Change Orders (ECO) modify a synthesized netlist after its specification has changed. ECO is divided into two major tasks: finding target signals whose functions should be updated and synthesizing the patch that produces the desired change. This paper proposes an efficient SAT-based solution for the second task: resource-aware computation of multi-output patch functions. The solution is based on several new algorithms and outperforms the top three winners of the 2017 ICCAD CAD Contest (Problem A).","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"47 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90706485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VRL-DRAM: Improving DRAM Performance via Variable Refresh Latency","authors":"Anup Das, Hasan Hassan, O. Mutlu","doi":"10.1145/3195970.3196136","DOIUrl":"https://doi.org/10.1145/3195970.3196136","url":null,"abstract":"A DRAM chip requires periodic refresh operations to prevent data loss due to charge leakage in DRAM cells. Refresh operations incur significant performance overhead as a DRAM bank/rank becomes unavailable to service access requests while being refreshed. In this work, our goal is to reduce the performance overhead of DRAM refresh by reducing the latency of a refresh operation. We observe that a significant number of DRAM cells can retain their data for longer than the worst-case refresh period of 64ms. Such cells do not always need to be fully refreshed; a low-latency partial refresh is sufficient for them.We propose Variable Refresh Latency DRAM (VRL-DRAM), a mechanism that fully refreshes a DRAM cell only when necessary, and otherwise ensures data integrity by issuing low-latency partial refresh operations. We develop a new detailed analytical model to estimate the minimum latency of a refresh operation that ensures data integrity of a cell with a given retention time profile. We evaluate VRL-DRAM with memory traces from real workloads, and show that it reduces the average refresh performance overhead by 34% compared to the state-of-the-art approach.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"37 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89955223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markus Miettinen, T. D. Nguyen, A. Sadeghi, N. Asokan
{"title":"Revisiting Context-Based Authentication in IoT","authors":"Markus Miettinen, T. D. Nguyen, A. Sadeghi, N. Asokan","doi":"10.1145/3195970.3196106","DOIUrl":"https://doi.org/10.1145/3195970.3196106","url":null,"abstract":"The emergence of IoT poses new challenges towards solutions for authenticating numerous very heterogeneous IoT devices to their respective trust domains. Using passwords or pre-defined keys have drawbacks that limit their use in IoT scenarios. Recent works propose to use contextual information about ambient physical properties of devices' surroundings as a shared secret to mutually authenticate devices that are co-located, e.g., the same room. In this paper, we analyze these context-based authentication solutions with regard to their security and requirements on context quality. We quantify their achievable security based on empirical real-world data from context measurements in typical IoT environments.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"18 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72880805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}