{"title":"Reliable Memristive Neural Network Accelerators Based on Early Denoising and Sparsity Induction","authors":"Anlan Yu, Ning Lyu, Wujie Wen, Zhiyuan Yan","doi":"10.1109/ASP-DAC52403.2022.9712525","DOIUrl":"https://doi.org/10.1109/ASP-DAC52403.2022.9712525","url":null,"abstract":"Implementing deep neural networks (DNNs) in hardware is challenging due to the requirements of huge memory and computation associated with DNNs' primary operation—matrix-vector multiplications (MVMs). Memristive crossbar shows great potential to accelerate MVMs by leveraging its capability of in-memory computation. However, one critical obstacle to such a technique is potentially significant inference accuracy degradation caused by two primary sources of errors—the variations during computation and stuck-at-faults (SAFs). To overcome this obstacle, we propose a set of dedicated schemes to significantly enhance its tolerance against these errors. First, a minimum mean square error (MMSE) based denoising scheme is proposed to diminish the impact of variations during computation in the intermediate layers. To the best of our knowledge, this is the first work considering denoising in the intermediate layers without extra crossbar resources. Furthermore, MMSE early denoising not only stabilizes the crossbar computation results but also mitigates errors caused by low resolution analog-to-digital converters. Second, we propose a weights-to-crossbar mapping scheme by inverting bits to mitigate the impact of SAFs. The effectiveness of the proposed bit inversion scheme is analyzed theoretically and demonstrated experimentally. Finally, we propose to use L1 regularization to increase the network sparsity, as a greater sparsity not only further enhances the effectiveness of the proposed bit inversion scheme, but also facilitates other early denoising mechanisms. Experimental results show that our schemes can achieve 40%-78% accuracy improvement, for the MNIST and CIFAR10 classification tasks under different networks.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128236516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zehao Wang, Zhenli He, Hui Fang, Yi-Xiong Huang, Ying Sun, Yu Yang, Zhi-Yuan Zhang, Di Liu
{"title":"Efficient On-Device Incremental Learning by Weight Freezing","authors":"Zehao Wang, Zhenli He, Hui Fang, Yi-Xiong Huang, Ying Sun, Yu Yang, Zhi-Yuan Zhang, Di Liu","doi":"10.1109/ASP-DAC52403.2022.9712563","DOIUrl":"https://doi.org/10.1109/ASP-DAC52403.2022.9712563","url":null,"abstract":"On-device learning has become a new trend for edge intelligence systems. In this paper, we investigate the on-device in-cremental learning problem, which targets to learn new classes on top of a well-trained model on the device. Incremental learning is known to suffer from catastrophic forgetting, i.e., a model learns new classes at the cost of forgetting the old classes. Inspired by model pruning techniques, we propose a new on-device incremental learning method based on weight freezing. The weight freezing in our framework plays two roles: 1) preserving the knowledge of the old classes; 2) boosting the training procedure. By means of weight freezing, we build up an efficient incremental learning framework which combines knowledge distillation to fine-tune the new model. We conduct extensive experiments on CIFAR100 and compare our method with two existing methods. The experimental results show that our method can achieve higher accuracy after incrementally learning new classes.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124368220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Streaming Accuracy: Characterizing Early Termination in Stochastic Computing","authors":"Hsuan Hsiao, Joshua San Miguel, J. Anderson","doi":"10.1109/ASP-DAC52403.2022.9712540","DOIUrl":"https://doi.org/10.1109/ASP-DAC52403.2022.9712540","url":null,"abstract":"Stochastic computing has garnered interest in the research community due to its ability to implement complicated compute with very small area footprints, at the cost of some accuracy and higher latency. With its unique tradeoffs between area, accuracy and latency, one commonly used technique to minimize area and latency is to early-terminate computation. Given this, it is useful to be able to measure and characterize how amenable a bitstream is to early termination. We present Streaming Accuracy, a metric that measures how far a bitstream is from its most early-terminable form. We show that it overcomes limitations of prior studies, and we characterize the design space for building stochastic circuits with early termination. We then propose a new hardware bitstream generator that produces bitstreams with optimal streaming accuracy.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134093293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault Testing and Diagnosis Techniques for Carbon Nanotube-Based FPGAs","authors":"Kangwei Xu, Yuanqing Cheng","doi":"10.1109/asp-dac52403.2022.9712558","DOIUrl":"https://doi.org/10.1109/asp-dac52403.2022.9712558","url":null,"abstract":"As process technology shrinks into the nanometer-scale, the CMOS-based Field Programmable Gate Arrays (FPGAs) face big challenges in the scalability of performance and power consumption. Multi-walled Carbon Nanotube (MWCNT) serves as a promising candidate for Cu interconnects due to superior conductivity. Moreover, Carbon Nanotube Field Transistor (CNFET) also emerges as a prospective alternative to the conventional CMOS device because of its higher power efficiency and larger noise margin. However, the MWCNT interconnects exhibit significant variations due to an immature fabrication process, leading to delay faults. Furthermore, the non-ideal CNFET fabrication process may generate a few metallic-CNTs (m-CNTs), rendering correlated faulty blocks. In this paper, we propose a ring oscillator (RO) based testing technique to detect delay faults due to the process variations of MWCNT interconnects. In addition, a novel circuit design based on the lookup table (LUT) is applied to speed up the fault testing of CNT-based FPGAs. Finally, we propose a testing algorithm to detect m-CNTs in configurable logic blocks (CLBs). Experimental results show that the test application time for a 6-input LUT can be reduced by 35.49% compared to the conventional testing method, and the proposed algorithm can also achieve a high fault coverage with lower testing overheads.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122911774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wen Zhang, Tao Liu, Mimi Xie, Longzhuang Li, Dulal C. Kar, Chen Pan
{"title":"Energy Harvesting Aware Multi-Hop Routing Policy in Distributed IoT System Based on Multi-Agent Reinforcement Learning","authors":"Wen Zhang, Tao Liu, Mimi Xie, Longzhuang Li, Dulal C. Kar, Chen Pan","doi":"10.1109/asp-dac52403.2022.9712528","DOIUrl":"https://doi.org/10.1109/asp-dac52403.2022.9712528","url":null,"abstract":"Energy harvesting technologies offer a promising solution to sustainably power an ever-growing number of Internet of Things (IoT) devices. However, due to the weak and transient natures of energy harvesting, IoT devices have to work intermittently rendering conventional routing policies and energy allocation strategies impractical. To this end, this paper, for the very first time, developed a distributed multi-agent reinforcement algorithm known as global actor-critic policy (GAP) to address the problem of routing policy and energy allocation together for the energy harvesting powered IoT system. At the training stage, each IoT device is treated as an agent and one universal model is trained for all agents to save computing resources. At the inference stage, packet delivery rate can be maximized. The experimental results show that the proposed GAP algorithm achieves ~ 1.28× and ~ 1.24× data transmission rate than that of the Q-table and ESDSRAA algorithm, respectively.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127784051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RTL Regression Test Selection using Machine Learning","authors":"G. Parthasarathy, Aabid Rushdi, Parivesh Choudhary, Saurav Nanda, Malan Evans, Hansika Gunasekara, Sridhar Rajakumar","doi":"10.1109/ASP-DAC52403.2022.9712550","DOIUrl":"https://doi.org/10.1109/ASP-DAC52403.2022.9712550","url":null,"abstract":"Regression testing is a technique to ensure that micro-electronic circuit design functionality is correct under iterative changes during the design process. This incurs significant costs in the hardware design and verification cycle in terms of productivity, machine and simulation software costs, and time - sometimes as much as 70% of the hardware design costs. We propose a machine learning approach to select a subset of tests from the set of all RTL regression tests for the design. Ideally, the selected subset should detect all failures that the full set of tests would have detected. Our approach learns characteristics of both RTL code and tests during the verification process to estimate the likelihood that a test will expose a bug introduced by an incremental design modification. This paper describes our approach to the problem and its implementation. We also present experiments on several real-world designs of various types with different types of test-suites that demonstrate significant time and resource savings while maintaining validation quality.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126676771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Avatar: Reinforcing Fault Attack Countermeasures in EDA with Fault Transformations","authors":"P. Roy, Patanjali Slpsk, C. Rebeiro","doi":"10.1109/asp-dac52403.2022.9712539","DOIUrl":"https://doi.org/10.1109/asp-dac52403.2022.9712539","url":null,"abstract":"Cryptography hardware are highly vulnerable to a class of side-channel attacks known as Differential Fault Analysis (DFA). These attacks exploit fault induced errors to compromise secret keys from ciphers within a few seconds. A bias in the error probabilities strengthens the attack considerably. It abets in bypassing countermeasures and is also the basis of powerful attack variants like the Differential Fault Intensity Analysis (DFIA) and Statistical Ineffective Fault Analysis (SIFA). In this paper, we make two significant contributions. First, we identify the correlation between fault induced errors and gatelevel parameters like the threshold voltage, gate size, and ${V_{text{DD}}}$. We show how these parameters can influence the bias in the error probabilities. Then, we propose an algorithm, called Avatar, that carefully tunes gate-level parameters to strengthen the redundancy countermeasures against DFA, DFIA, and SIFA attacks with no additional logic needed. The central idea of Avatar is to reconfigure gates in the redundant circuits so that each circuit has a unique behavior to faults, making fault detection much more efficient. In AES for instance, fault attack resistance improves by 40% for DFA and DFIA, and 99% in the case of SIFA. Avatar incurs negligible area overheads and can be quickly adopted in any cipher design. It can be incorporated in commercial EDA flows and provides users with tunable knobs to trade-off performance and power consumption, for fault attack security.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125309817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Negar Neda, Salim Ullah, A. Ghanbari, H. Mahdiani, M. Modarressi, Akash Kumar
{"title":"Multi-Precision Deep Neural Network Acceleration on FPGAs","authors":"Negar Neda, Salim Ullah, A. Ghanbari, H. Mahdiani, M. Modarressi, Akash Kumar","doi":"10.1109/asp-dac52403.2022.9712485","DOIUrl":"https://doi.org/10.1109/asp-dac52403.2022.9712485","url":null,"abstract":"Quantization is a promising approach to reduce the computational load of neural networks. The minimum bit-width that preserves the original accuracy varies significantly across different neural networks and even across different layers of a single neural network. Most existing designs over-provision neural network accelerators with sufficient bit-width to preserve the required accuracy across a wide range of neural networks. In this paper, we present mpDNN, a multi-precision multiplier with dynamically adjustable bit-width for deep neural network acceleration. The design supports run-time splitting an arithmetic operator into multiple independent operators with smaller bit-width, effectively increasing throughput when lower precision is required. The proposed architecture is designed for FPGAs, in that the multipliers and bit-width adjustment mechanism are optimized for the LUT-based structure of FPGAs. Experimental results show that by enabling run-time precision adjustment, mpDNN can offer 3-15x improvement in throughput.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134432091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight and Secure Branch Predictors against Spectre Attacks","authors":"Congcong Chen, Chaoqun Shen, Jiliang Zhang","doi":"10.1109/asp-dac52403.2022.9712481","DOIUrl":"https://doi.org/10.1109/asp-dac52403.2022.9712481","url":null,"abstract":"Spectre attacks endanger most of CPUs, operating systems and cloud services due to the sharing of branch predic- tors in modern processors, while existing defenses fail to balance the security and overhead. This paper designs a lightweight and secure branch predictor (LS-BP), which provides lightweight hardware isolation for different branch entries of same-address- space and cross-address-space. Therefore, it is difficult for the attacker to establish branch conflicts. Experimental results show the average performance overhead is less than 3% while providing strong protection.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132910323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Thermal Analysis for Chiplet Design based on Graph Convolution Networks","authors":"Liang Chen, Wentian Jin, S. Tan","doi":"10.1109/ASP-DAC52403.2022.9712583","DOIUrl":"https://doi.org/10.1109/ASP-DAC52403.2022.9712583","url":null,"abstract":"2.5D chiplet-based technology promises an efficient integration technique for advanced designs with more functionality and higher performance. Temperature and related thermal optimization, heat removal are of critical importance for temperature-aware physical synthesis for chiplets. This paper presents a novel graph convolutional networks (GCN) architecture to estimate the thermal map of the 2.5D chiplet-based systems with the thermal resistance networks built by the compact thermal model (CTM). First, we take the total power of all chiplets as an input feature, which is a global feature. This additional global information can overcome the limitation that the GCN can only extract local information via neighborhood aggregation. Second, inspired by convolutional neural networks (CNN), we add skip connection into the GCN to pass the global feature directly across the hidden layers with the concatenation operation. Third, to consider the edge embedding feature, we propose an edge-based attention mechanism based on the graph attention networks (GAT). Last, with the multiple aggregators and scalers of principle neighborhood aggregation (PNA) networks, we can further improve the modeling capacity of the novel GCN. The experimental results show that the proposed GCN model can achieve an average RMSE of 0.31 K and deliver a 2.6× speedup over the fast steady-state solver of open-source HotSpot based on SuperLU. More importantly, the GCN model demonstrates more useful generalization or transferable capability. Our results show that the trained GCN can be directly applied to predict thermal maps of six unseen datasets with acceptable mean RMSEs of less than 0.67 K without retraining via inductive learning.","PeriodicalId":239260,"journal":{"name":"2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"2677 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133478091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}