{"title":"Reducing DRAM Access Latency via Helper Rows","authors":"Xin Xin, Youtao Zhang, Jun Yang","doi":"10.1109/DAC18072.2020.9218719","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218719","url":null,"abstract":"The DRAM technology advancement has seen success in memory density and throughput improvement, but less in access latency reduction. This is mainly due to the intrinsic limitation of capacitance based bit store and access mechanism. The reduction of access latency has been well explored in literature. However, the recently proposed DRAM techniques, such as RowClone and Half-DRAM, offer new opportunities to further optimise the access latency.In this paper, we propose an efficient access strategy to improve the performance of DRAM by optionally discarding the restore. When activating a new row, our technique makes a copy of the row leveraging the RowClone method. Next time when accessing the same row, the cloned row is opened for sensing but it is not restored as the data is preserved in the original row. To improve the efficiency of our proposed strategy, we further exploit three schemes to minimize the copy overhead and increase the reuse of the cloned row. Experimental results show that our proposed strategy can achieve 11% performance improvement on average.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116601753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaekang Shin, Seungkyu Choi, Yeongjae Choi, L. Kim
{"title":"A Pragmatic Approach to On-device Incremental Learning System with Selective Weight Updates","authors":"Jaekang Shin, Seungkyu Choi, Yeongjae Choi, L. Kim","doi":"10.1109/DAC18072.2020.9218507","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218507","url":null,"abstract":"Incremental learning is drawing attention to widen capabilities of device-AI. Previous works have researched to reduce numerous computations and memory accesses required for the training process of IL, but they could not show a noticeable improvement in the weight gradient computation (WGC) phase. Therefore, we propose a selective weight update technique that searches for critical weights to be updated by applying the IL algorithm that training per-task binary masks. Also, we introduce a novel dataflow for the implementation of selective WGC on typical NPUs with minimum overheads. On average, our system shows a 2.9× speed up and 2.5× energy efficiency in WGC without degrading training quality.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117008616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Layer RBER Variation Aware Read Performance Optimization for 3D Flash Memories","authors":"Shiqiang Nie, Youtao Zhang, Weiguo Wu, Jun Yang","doi":"10.1109/DAC18072.2020.9218631","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218631","url":null,"abstract":"3D NAND flash enables the construction of large capacity Solid-State Drives (SSDs) for modern computer systems. While effectively reducing per bit cost, 3D NAND flash exhibits non-negligible process variations and thus RBER (raw bit error rate) difference across layers, which leads to sub-optimal read performance for applications with either small or large I/O requests. In this paper, we propose LRR, Layer RBER variation aware Read optimization schemes, to address the challenge. LRR consists of two schemes — LRR subpage read scheduling (SRS) and LRR fullpage allocation (FPA). SRS groups small read requests from the layers with similar RBERs to reduce the average read latency of subpage sized read requests. FPA distributes the data of a large write to multiple layers, which improves the read latency when reading from layers with large RBERs. Our experimental results show that our proposed scheme LRR reduces 46% read latency on average over the state-of-the-art.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129922224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. R. Fadiheh, Johannes Müller, R. Brinkmann, S. Mitra, D. Stoffel, W. Kunz
{"title":"A Formal Approach for Detecting Vulnerabilities to Transient Execution Attacks in Out-of-Order Processors","authors":"M. R. Fadiheh, Johannes Müller, R. Brinkmann, S. Mitra, D. Stoffel, W. Kunz","doi":"10.1109/DAC18072.2020.9218572","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218572","url":null,"abstract":"Transient execution attacks, such as Spectre and Meltdown, create a new and serious attack surface in modern processors. In spite of all countermeasures taken during recent years, the cycles of alarm and patch are ongoing and call for a better formal understanding of the threat and possible preventions.This paper introduces a formal definition of security with respect to transient execution attacks, formulated as a HW property. We present a formal method for security verification by HW property checking based on extending Unique Program Execution Checking (UPEC) to out-of-order processors. UPEC can be used to systematically detect all vulnerabilities to transient execution attacks, including vulnerabilities unknown so far. The feasibility of our approach is demonstrated at the example of the BOOM processor, which is a design with more than 650,000 state bits. In BOOM our approach detects a new, so far unknown vulnerability, called Spectre-STC, indicating that also single-threaded processors can be vulnerable to contention-based Spectre attacks.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123376483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Design of Large Area Flexible Electronics via Compressed Sensing","authors":"Leilai Shao, Ting Lei, Tsung-Ching Huang, Zhenan Bao, Kwang-Ting Cheng","doi":"10.1109/DAC18072.2020.9218570","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218570","url":null,"abstract":"Large area flexible electronics (FE) is emerging for low-cost, light-weight wearable electronics, artificial skins and IoT nodes, benefiting from its low-cost fabrication and mechanical flexibility. How-ever, the low temperature requirement for fabrication on a flexible substrate and the large-area nature of flexible sensor arrays inevitably result in inadequate device yield, reliability and stability. Therefore, it is essential to develop design methodologies for large area sensing applications which can ensure system robustness with-out relying on highly reliable devices. Based on the observation that most signals sensed by body sensor arrays exhibit sparse statistical characteristics, we propose a system design method which lever-ages the sparse nature via compressed sensing (CS). Specifically, we use flexible circuitry to implement a CS encoder and decode the compressed signal in the silicon side. As a system demonstration, we fabricated the temperature sensor array, shift register and amplifier to illustrate the feasibility of the encoder design using carbon-nanotube-based flexible thin-film transistors. To evaluate the improvement of system robustness achieved by the proposed sensing schema, we conducted two case studies: temperature imaging and tactile-sensor based object recognition. With ∼10% sparse errors (due to either device defects or transient errors), we achieved reduction of root-mean-square-error (RMSE) from 0.20 to 0.05 for temperature sensing and boost the classification accuracy from 65% to 84% for tactile-sensing based object recognition.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124333142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daan van der Valk, Marina Krček, S. Picek, S. Bhasin
{"title":"Learning From A Big Brother - Mimicking Neural Networks in Profiled Side-channel Analysis","authors":"Daan van der Valk, Marina Krček, S. Picek, S. Bhasin","doi":"10.1109/DAC18072.2020.9218520","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218520","url":null,"abstract":"Recently, deep learning has emerged as a powerful technique for side-channel attacks, capable of even breaking common countermeasures. Still, trained models are generally large, and thus, performing evaluation becomes resource-intensive. The resource requirements increase in realistic settings where traces can be noisy, and countermeasures are active. In this work, we exploit mimicking to compress the learned models. We demonstrate up to 300 times compression of a state-of-the-art CNN. The mimic shallow network can also achieve much better accuracy as compared to when trained on original data and even reach the performance of a deeper network.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126527092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Leaming to Set Meta-Heuristic Specific Parameters for High-Level Synthesis Design Space Exploration","authors":"Z. Wang, B. C. Schafer","doi":"10.1109/DAC18072.2020.9218674","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218674","url":null,"abstract":"Raising the level of VLSI design abstraction to C leads to many advantages compared to the use of low-level Hardware Description Languages (HDLs). One key advantage is that it allows the generation of micro-architectures with different trade-offs by simply setting unique combinations of synthesis options. Because the number of these synthesis options is typically very large, exhaustive enumerations are not possible. Hence, heuristics are required. Meta-heuristics like Simulated Annealing (SA), Genetic Algorithm (GA) and Ant Colony Optimizations (ACO) have shown to lead to good results for these types of multi-objective optimization problems. The main problem with these meta-heuristics is that they are very sensitive to their hyper-parameter settings, e.g. in the GA case, the mutation and crossover rate and the number of parents pairs. To address this, in this work we present a machine learning based approach to automatically set the search parameters for these three meta-heuristics such that a new unseen behavioral description given in C can be effectively explored. Moreover, we present an exploration technique that combines the SA, GA and ACO together and show that our proposed exploration method outperforms a single meta-heuristic.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127984312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tung-Wei Lin, Wei-Chen Tai, Yu-Cheng Lin, I. Jiang
{"title":"Routing Topology and Time-Division Multiplexing Co-Optimization for Multi-FPGA Systems","authors":"Tung-Wei Lin, Wei-Chen Tai, Yu-Cheng Lin, I. Jiang","doi":"10.1109/DAC18072.2020.9218667","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218667","url":null,"abstract":"Time-division multiplexing (TDM) is widely used to overcome bandwidth limitations and thus enhances routability in multi-FPGA systems due to the shortage of I/O pins in an FPGA. However, multiplexed signals induce significant delays. To evaluate timing degradation, nets with similar criticalities are often grouped to form NetGroups. In this paper, we propose a framework concerning routing topology and time-division multiplexing co-optimization for multi-FPGA systems. The proposed framework first generates high-quality topologies considering Net-Group criticalities. Then, inspired by column generation, TDM ratio assignment is solved optimally by Lagrangian relaxation. Experimental results show that our approach outperforms the top three entries of ICCAD 2019 CAD Contest. Moreover, our TDM ratio assignment algorithm can further improve the results of the top three winners to almost as good as ours.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125769240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Topological Structure and Physical Layout Codesign for Wavelength-Routed Optical Networks-on-Chip","authors":"Yu-Sheng Lu, Sheng-Jung Yu, Yao-Wen Chang","doi":"10.1109/DAC18072.2020.9218625","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218625","url":null,"abstract":"The wavelength-routed optical network-on-chip (WRONoC) is a promising solution for signal transmission in modern system-on-chip (SoC) designs. Previous works do not handle three main issues for WRONoCs: correlations between the topological structure and physical layout, trade-offs between the maximum insertion loss and wavelength power, and a fully automated flow to generate predictable designs. As a result, the insertion loss estimation is inaccurate, and thus only suboptimal results are obtained. To remedy these disadvantages, we present a fully automated topological structure and physical layout codesign flow to minimize the maximum insertion loss and the wavelength power simultaneously with a significant speedup. Experimental results show that our codesign flow significantly outperforms state-of-the-art works in the maximum insertion loss, wavelength power, and runtimes.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115977592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sugil Lee, Giju Jung, M. Fouda, Jongeun Lee, A. Eltawil, F. Kurdahi
{"title":"Learning to Predict IR Drop with Effective Training for ReRAM-based Neural Network Hardware","authors":"Sugil Lee, Giju Jung, M. Fouda, Jongeun Lee, A. Eltawil, F. Kurdahi","doi":"10.1109/DAC18072.2020.9218735","DOIUrl":"https://doi.org/10.1109/DAC18072.2020.9218735","url":null,"abstract":"Due to the inevitability of the IR drop problem in passive ReRAM crossbar arrays, finding a software solution that can predict the effect of IR drop without the need of expensive SPICE simulations, is very desirable. In this paper, two simple neural networks are proposed as software solution to predict the effect of IR drop. These networks can be easily integrated in any deep neural network framework to incorporate the IR drop problem during training. As an example, the proposed solution is integrated in BinaryNet framework and the test validation results, done through SPICE simulations, show very high improvement in performance close to the baseline performance, which demonstrates the efficacy of the proposed method. In addition, the proposed solution outperforms the prior work on challenging datasets such as CIFAR10 and SVHN.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132472697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}