Tomasz Kloda, Jiyang Chen, A. Bertout, L. Sha, M. Caccamo
{"title":"Latency analysis of self-suspending task chains","authors":"Tomasz Kloda, Jiyang Chen, A. Bertout, L. Sha, M. Caccamo","doi":"10.23919/DATE54114.2022.9774655","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774655","url":null,"abstract":"Many cyber-physical systems are offloading computation-heavy programs to hardware accelerators (e.g., GPU and TPU) to reduce execution time. These applications will self-suspend between offloading data to the accelerators and obtaining the returned results. Previous efforts have shown that self-suspending tasks can cause scheduling anomalies, but none has examined inter-task communication. This paper aims to explore self-suspending tasks' data chain latency with periodic activation and asynchronous message passing. We first present the cause for suspension-induced delays and worst-case latency analysis. We then propose a rule for utilizing the hardware co-processors to reduce data chain latency and schedulability analysis. Simulation results show that the proposed strategy can improve overall latency while preserving system schedulability.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"48 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120897686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Inter-IP Malicious Modification Detection through Static Information Flow Tracking","authors":"Zhaoxiang Liu, Orlando Arias, Weimin Fu, Yier Jin, Xiaolong Guo","doi":"10.23919/DATE54114.2022.9774694","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774694","url":null,"abstract":"To help expand the usage of formal methods in the hardware security domain. We propose a static register-transfer level (RTL) security analysis framework and an electronic design automation (EDA) tool named If-Tracker to support the proposed framework. Through this framework, a data-flow model will be automatically extracted from the RTL description of the SoC. Information flow security properties will then be generated. The tool checks all possible inter-IP paths to verify whether any property violations exist. The effectiveness of the proposed framework is demonstrated on customized SoC designs using AMBA bus where malicious modifications are inserted across multiple IPs. Existing IP level security analysis tools cannot detect such Trojans. Compared to commercial formal tools such as Cadence JasperGold and Synopsys VC-Formal, our framework provides a much simpler user interface and can identify more types of malicious modifications.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122528409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EM SCA & FI Self-Awareness and Resilience with Single On-chip Loop & ML Classifiers","authors":"A. Ghosh, D. Das, Santosh K. Ghosh, Shreyas Sen","doi":"10.23919/DATE54114.2022.9774588","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774588","url":null,"abstract":"Securing ICs are becoming increasingly challenging with rapid improvements in electromagnetic (EM) side-channel analysis (SCA) and fault injection (FI) attacks. In this work, we develop a pro-active approach to detect and counter these attacks by embedding a single on-chip integrated loop around a crypto core (AES-256), designed and fabricated using TSMC 65nm process. The measured results demonstrate that the proposed system 1) provides EM-Self-awareness by acting as an on-chip H-field sensor, detecting voltage/clock glitching fault-attacks; 2) senses an approaching EM probe to detect any incoming threat; and 3) can be used to induce EM noise to increase resilience against EM attacks. This work combines EM analysis, ML based secured system and shows the efficacy by measurements from custom-built 65nm CMOS IC.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131214329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yvan Tortorella, L. Bertaccini, D. Rossi, L. Benini, Francesco Conti
{"title":"RedMulE: A Compact FP16 Matrix-Multiplication Accelerator for Adaptive Deep Learning on RISC-V-Based Ultra-Low-Power SoCs","authors":"Yvan Tortorella, L. Bertaccini, D. Rossi, L. Benini, Francesco Conti","doi":"10.48550/arXiv.2204.11192","DOIUrl":"https://doi.org/10.48550/arXiv.2204.11192","url":null,"abstract":"The fast proliferation of extreme-edge applications using Deep Learning (DL) based algorithms required dedicated hardware to satisfy extreme-edge applications' latency, through-put, and precision requirements. While inference is achievable in practical cases, online finetuning and adaptation of general DL models are still highly challenging. One of the key stumbling stones is the need for parallel floating-point operations, which are considered unaffordable on sub-100 mW extreme-edge SoCs. We tackle this problem with RedMulE (Reduced-precision ma-trix Multiplication Engine), a parametric low-power hardware accelerator for FP16 matrix multiplications - the main kernel of DL training and inference - conceived for tight integration within a cluster of tiny RISC- V cores based on the PULP (Parallel Ultra-Low-Power) architecture. In 22 nm technology, a 32-FMA RedMulE instance occupies just 0.07mm2(14% of an 8-core RISC- V cluster) and achieves up to 666 MHz maximum operating frequency, for a throughput of 31.6 MAC/cycle (98.8% utilization). We reach a cluster-level power consumption of 43.5 mW and a full-cluster energy efficiency of 688 16-bit GFLOPS/W. Overall, RedMulE features up to 4.65 x higher energy efficiency and 22 x speedup over SW execution on 8 RISC- V cores.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131254777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Orlando Arias, Zhaoxiang Liu, Xiaolong Guo, Yier Jin, Shuo Wang
{"title":"RTSEC: Automated RTL Code Augmentation for Hardware Security Enhancement","authors":"Orlando Arias, Zhaoxiang Liu, Xiaolong Guo, Yier Jin, Shuo Wang","doi":"10.23919/DATE54114.2022.9774745","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774745","url":null,"abstract":"Current hardware designs have increased in complexity, resulting in a reduced ability to perform security checks on them. Further, the addition of any security features to these designs is still largely manual which further complicates the design and integration process. In this paper, we address these shortcomings by introducing Rtsec as a framework which is capable of performing security analysis on designs as well as integrating security features directly into the HDL code, a feature that commercial EDA tools do not provide. Rtsec first breaks down HDL code into an Abstract Syntax Tree which is then used to infer the logic of the design. We demonstrate how Rtsec can be utilized to automatically include security mechanisms in RTL designs: watermarking and logic locking. We also compare the efficacy of our analysis algorithms with state of the art tools, demonstrating that Rtsec has capabilities equal or superior to those of state of the art tools while also providing the means of enhancing security features to the design.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133601110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Masoumian, G. Selimis, Rui Wang, G. Schrijen, S. Hamdioui, M. Taouil
{"title":"Reliability Analysis of FinFET-Based SRAM PUFs for 16nm, 14nm, and 7nm Technology Nodes","authors":"S. Masoumian, G. Selimis, Rui Wang, G. Schrijen, S. Hamdioui, M. Taouil","doi":"10.23919/DATE54114.2022.9774735","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774735","url":null,"abstract":"SRAM Physical Unclonable Functions (PUFs) are among other things today commercially used for secure primitives such as key generation and authentication. The quality of the PUFs and hence the security primitives, depends on intrinsic variations which are technology dependent. Therefore, to sustain the commercial usage of PUFs for cutting-edge technologies, it is important to properly model and evaluate their reliability. In this work, we evaluate the SRAM PUF reliability using within class Hamming distance (WCHD) for 16nm, 14nm, and 7nm using simulations and silicon validation for both low-power and high-performance designs. The results show that our simulation models and expectations match with the silicon measurements. From the experiments, we conclude the following: (1) SRAM PUF is reliable in advanced FinFET technology nodes, i.e., the noise is low in 16nm, 14nm, and 7nm, (2) temperature variations have a marginal impact on the reliability, and (3) both low-power and high-performance SRAMs can be used as a PUF without excessive need of error correcting codes (ECCs).","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132216962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithm-Hardware Co-Design for Efficient Brain-Inspired Hyperdimensional Learning on Edge","authors":"Yang Ni, Yeseong Kim, T. Simunic, M. Imani","doi":"10.23919/DATE54114.2022.9774524","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774524","url":null,"abstract":"Machine learning methods have been widely utilized to provide high quality for many cognitive tasks. Running sophisticated learning tasks requires high computational costs to process a large amount of learning data. Brain-inspired Hyperdimensional Computing (HDC) is introduced as an alternative solution for lightweight learning on edge devices. However, HDC models still rely on accelerators to ensure realtime and efficient learning. These hardware designs are not commercially available and need a relatively long period to synthesize and fabricate after deriving the new applications. In this paper, we propose an efficient framework for accelerating the HDC at the edge by fully utilizing the available computing power. We optimize the HDC through algorithm-hardware co-design of the host CPU and existing low-power machine learning accelerators, such as Edge TPU. We interpret the lightweight HDC learning model as a hyper-wide neural network to take advantage of the accelerator and machine learning platform. We further improve the runtime cost of training by employing a bootstrap aggregating algorithm called bagging while maintaining the learning quality. We evaluate the performance of the proposed framework with several applications. Joint experiments on mobile CPU and the Edge TPU show that our framework achieves 4.5 × faster training and 4.2 × faster inference compared to the baseline platform. In addition, our framework achieves 19.4 × faster training and 8.9 × faster inference as compared to embedded ARM CPU, Raspberry Pi, that consumes similar power consumption.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133896884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Runyu Zhang, Duo Liu, Chaoshu Yang, Xianzhang Chen, Lei Qiao, Yujuan Tan
{"title":"Optimizing CoW-based File Systems on Open-Channel SSDs with Persistent Memory","authors":"Runyu Zhang, Duo Liu, Chaoshu Yang, Xianzhang Chen, Lei Qiao, Yujuan Tan","doi":"10.23919/DATE54114.2022.9774695","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774695","url":null,"abstract":"Block-based file systems, such as Btrfs, utilize the copy-on-write (CoW) mechanism to guarantee data consistency on solid-state drives (SSDs). Open-channel SSD provides opportunities for in-depth optimization of block-based file systems. However, existing systems fail to co-design the two-layer semantics and cannot take full advantage of the open-channel characteristics. Specifically, synchronizing an overwrite in Btrfs will copy-on-write all pages in the update path and induce severe write amplification. In this paper, we propose a hybrid fine-grained copy-on-write and journaling mechanism (HyFiM) to address these problems. We first utilize persistent memories to preserve the address mapping table of open-channel SSD. Then, we design an intra-FTL copy-on-write mechanism (IFCoW) that eliminates the recursive updates caused by overwrites. Finally, we devise fine-grained metadata journals (FGMJ) to guarantee the consistency of metadata with minimum overhead. We prototype HyFiM based on Btrfs in the Linux kernel. Comprehensive evaluations demonstrate that HyFiM can outperform over Btrfs by 30.77% and 33.82% for sequential and random overwrites, respectively.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130346969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Zhang, Li Yang, Jian Meng, Jae-sun Seo, Yu Cao, Deliang Fan
{"title":"XST: A Crossbar Column-wise Sparse Training for Efficient Continual Learning","authors":"Fan Zhang, Li Yang, Jian Meng, Jae-sun Seo, Yu Cao, Deliang Fan","doi":"10.23919/DATE54114.2022.9774660","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774660","url":null,"abstract":"Leveraging the ReRAM crossbar-based In-Memory-Computing (IMC) to accelerate single task DNN inference has been widely studied. However, using the ReRAM crossbar for continual learning has not been explored yet. In this work, we propose XST, a novel crossbar column-wise sparse training framework for continual learning. XST significantly reduces the training cost and saves inference energy. More importantly, it is friendly to existing crossbar-based convolution engine with almost no hardware overhead. Compared with the state-of-the-art CPG method, the experiments show that XST's accuracy achieves 4.95 % higher accuracy. Furthermore, XST demonstrates ~5.59 × training speedup and 1.5 × inference energy-saving.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115733271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingqin Han, Yilan Zhu, Qian Lou, Zimeng Zhou, Shanqing Guo, Lei Ju
{"title":"coxHE: A software-hardware co-design framework for FPGA acceleration of homomorphic computation","authors":"Mingqin Han, Yilan Zhu, Qian Lou, Zimeng Zhou, Shanqing Guo, Lei Ju","doi":"10.23919/DATE54114.2022.9774559","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774559","url":null,"abstract":"Data privacy becomes a crucial concern in the AI and big data era. Fully homomorphic encryption (FHE) is a promising data privacy protection technique where the entire computation is performed on encrypted data. However, the dramatic increase of the computation workload restrains the usage of FHE for the real-world applications. In this paper, we propose an FPFA accelerator design framework for CKKS-based HE. While the KeySwitch operations are the primary performance bottleneck of FHE computation, we propose a low latency design of KeySwitch module with reduced intra-operation data dependency. Compared with the state-of-the-art FPGA based key-switch implementation that is based on Verilog, the proposed high-level synthesis (HLS) based design reduces the operation latency by 40%. Furthermore, we propose an automated design space exploration framework which generates optimal encryption parameters and accelerators for a given application kernel and the target FPGA device. Experimental results for a set of real HE application kernels on different FPGA devices show that our HLS-based flexible design framework produces substantially better accelerator design compared with a fixed-parameter HE accelerator in terms of security, approximation error, and overall performance.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114334123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}