2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献_第4页

Latency analysis of self-suspending task chains 自挂起任务链的延迟分析

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI: 10.23919/DATE54114.2022.9774655

Tomasz Kloda, Jiyang Chen, A. Bertout, L. Sha, M. Caccamo

引用次数: 1

Inter-IP Malicious Modification Detection through Static Information Flow Tracking 基于静态信息流跟踪的ip间恶意修改检测

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI: 10.23919/DATE54114.2022.9774694

Zhaoxiang Liu, Orlando Arias, Weimin Fu, Yier Jin, Xiaolong Guo

引用次数: 2

EM SCA & FI Self-Awareness and Resilience with Single On-chip Loop & ML Classifiers EM SCA和FI自我意识和弹性与单片上环路和ML分类器

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI: 10.23919/DATE54114.2022.9774588

A. Ghosh, D. Das, Santosh K. Ghosh, Shreyas Sen

引用次数: 3

RedMulE: A Compact FP16 Matrix-Multiplication Accelerator for Adaptive Deep Learning on RISC-V-Based Ultra-Low-Power SoCs RedMulE:用于基于risc - v的超低功耗soc的自适应深度学习的紧凑型FP16矩阵乘法加速器

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI: 10.48550/arXiv.2204.11192

Yvan Tortorella, L. Bertaccini, D. Rossi, L. Benini, Francesco Conti

{"title":"RedMulE: A Compact FP16 Matrix-Multiplication Accelerator for Adaptive Deep Learning on RISC-V-Based Ultra-Low-Power SoCs","authors":"Yvan Tortorella, L. Bertaccini, D. Rossi, L. Benini, Francesco Conti","doi":"10.48550/arXiv.2204.11192","DOIUrl":"https://doi.org/10.48550/arXiv.2204.11192","url":null,"abstract":"The fast proliferation of extreme-edge applications using Deep Learning (DL) based algorithms required dedicated hardware to satisfy extreme-edge applications' latency, through-put, and precision requirements. While inference is achievable in practical cases, online finetuning and adaptation of general DL models are still highly challenging. One of the key stumbling stones is the need for parallel floating-point operations, which are considered unaffordable on sub-100 mW extreme-edge SoCs. We tackle this problem with RedMulE (Reduced-precision ma-trix Multiplication Engine), a parametric low-power hardware accelerator for FP16 matrix multiplications - the main kernel of DL training and inference - conceived for tight integration within a cluster of tiny RISC- V cores based on the PULP (Parallel Ultra-Low-Power) architecture. In 22 nm technology, a 32-FMA RedMulE instance occupies just 0.07mm2(14% of an 8-core RISC- V cluster) and achieves up to 666 MHz maximum operating frequency, for a throughput of 31.6 MAC/cycle (98.8% utilization). We reach a cluster-level power consumption of 43.5 mW and a full-cluster energy efficiency of 688 16-bit GFLOPS/W. Overall, RedMulE features up to 4.65 x higher energy efficiency and 22 x speedup over SW execution on 8 RISC- V cores.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131254777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

RTSEC: Automated RTL Code Augmentation for Hardware Security Enhancement RTSEC:用于硬件安全增强的自动RTL代码增强

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI: 10.23919/DATE54114.2022.9774745

Orlando Arias, Zhaoxiang Liu, Xiaolong Guo, Yier Jin, Shuo Wang

引用次数: 2

Reliability Analysis of FinFET-Based SRAM PUFs for 16nm, 14nm, and 7nm Technology Nodes 16nm、14nm和7nm节点上基于finfet的SRAM puf可靠性分析

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI: 10.23919/DATE54114.2022.9774735

S. Masoumian, G. Selimis, Rui Wang, G. Schrijen, S. Hamdioui, M. Taouil

{"title":"Reliability Analysis of FinFET-Based SRAM PUFs for 16nm, 14nm, and 7nm Technology Nodes","authors":"S. Masoumian, G. Selimis, Rui Wang, G. Schrijen, S. Hamdioui, M. Taouil","doi":"10.23919/DATE54114.2022.9774735","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774735","url":null,"abstract":"SRAM Physical Unclonable Functions (PUFs) are among other things today commercially used for secure primitives such as key generation and authentication. The quality of the PUFs and hence the security primitives, depends on intrinsic variations which are technology dependent. Therefore, to sustain the commercial usage of PUFs for cutting-edge technologies, it is important to properly model and evaluate their reliability. In this work, we evaluate the SRAM PUF reliability using within class Hamming distance (WCHD) for 16nm, 14nm, and 7nm using simulations and silicon validation for both low-power and high-performance designs. The results show that our simulation models and expectations match with the silicon measurements. From the experiments, we conclude the following: (1) SRAM PUF is reliable in advanced FinFET technology nodes, i.e., the noise is low in 16nm, 14nm, and 7nm, (2) temperature variations have a marginal impact on the reliability, and (3) both low-power and high-performance SRAMs can be used as a PUF without excessive need of error correcting codes (ECCs).","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132216962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Algorithm-Hardware Co-Design for Efficient Brain-Inspired Hyperdimensional Learning on Edge 基于边缘的高效脑-超维学习算法-硬件协同设计

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI: 10.23919/DATE54114.2022.9774524

Yang Ni, Yeseong Kim, T. Simunic, M. Imani

{"title":"Algorithm-Hardware Co-Design for Efficient Brain-Inspired Hyperdimensional Learning on Edge","authors":"Yang Ni, Yeseong Kim, T. Simunic, M. Imani","doi":"10.23919/DATE54114.2022.9774524","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774524","url":null,"abstract":"Machine learning methods have been widely utilized to provide high quality for many cognitive tasks. Running sophisticated learning tasks requires high computational costs to process a large amount of learning data. Brain-inspired Hyperdimensional Computing (HDC) is introduced as an alternative solution for lightweight learning on edge devices. However, HDC models still rely on accelerators to ensure realtime and efficient learning. These hardware designs are not commercially available and need a relatively long period to synthesize and fabricate after deriving the new applications. In this paper, we propose an efficient framework for accelerating the HDC at the edge by fully utilizing the available computing power. We optimize the HDC through algorithm-hardware co-design of the host CPU and existing low-power machine learning accelerators, such as Edge TPU. We interpret the lightweight HDC learning model as a hyper-wide neural network to take advantage of the accelerator and machine learning platform. We further improve the runtime cost of training by employing a bootstrap aggregating algorithm called bagging while maintaining the learning quality. We evaluate the performance of the proposed framework with several applications. Joint experiments on mobile CPU and the Edge TPU show that our framework achieves 4.5 × faster training and 4.2 × faster inference compared to the baseline platform. In addition, our framework achieves 19.4 × faster training and 8.9 × faster inference as compared to embedded ARM CPU, Raspberry Pi, that consumes similar power consumption.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133896884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Optimizing CoW-based File Systems on Open-Channel SSDs with Persistent Memory 在具有持久内存的开放通道ssd上优化基于cow的文件系统

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI: 10.23919/DATE54114.2022.9774695

Runyu Zhang, Duo Liu, Chaoshu Yang, Xianzhang Chen, Lei Qiao, Yujuan Tan

{"title":"Optimizing CoW-based File Systems on Open-Channel SSDs with Persistent Memory","authors":"Runyu Zhang, Duo Liu, Chaoshu Yang, Xianzhang Chen, Lei Qiao, Yujuan Tan","doi":"10.23919/DATE54114.2022.9774695","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774695","url":null,"abstract":"Block-based file systems, such as Btrfs, utilize the copy-on-write (CoW) mechanism to guarantee data consistency on solid-state drives (SSDs). Open-channel SSD provides opportunities for in-depth optimization of block-based file systems. However, existing systems fail to co-design the two-layer semantics and cannot take full advantage of the open-channel characteristics. Specifically, synchronizing an overwrite in Btrfs will copy-on-write all pages in the update path and induce severe write amplification. In this paper, we propose a hybrid fine-grained copy-on-write and journaling mechanism (HyFiM) to address these problems. We first utilize persistent memories to preserve the address mapping table of open-channel SSD. Then, we design an intra-FTL copy-on-write mechanism (IFCoW) that eliminates the recursive updates caused by overwrites. Finally, we devise fine-grained metadata journals (FGMJ) to guarantee the consistency of metadata with minimum overhead. We prototype HyFiM based on Btrfs in the Linux kernel. Comprehensive evaluations demonstrate that HyFiM can outperform over Btrfs by 30.77% and 33.82% for sequential and random overwrites, respectively.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130346969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

XST: A Crossbar Column-wise Sparse Training for Efficient Continual Learning XST:一种用于高效持续学习的交叉柱式稀疏训练

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI: 10.23919/DATE54114.2022.9774660

Fan Zhang, Li Yang, Jian Meng, Jae-sun Seo, Yu Cao, Deliang Fan

引用次数: 4

coxHE: A software-hardware co-design framework for FPGA acceleration of homomorphic computation coxHE:一种FPGA加速同态计算的软硬件协同设计框架

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI: 10.23919/DATE54114.2022.9774559

Mingqin Han, Yilan Zhu, Qian Lou, Zimeng Zhou, Shanqing Guo, Lei Ju

{"title":"coxHE: A software-hardware co-design framework for FPGA acceleration of homomorphic computation","authors":"Mingqin Han, Yilan Zhu, Qian Lou, Zimeng Zhou, Shanqing Guo, Lei Ju","doi":"10.23919/DATE54114.2022.9774559","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774559","url":null,"abstract":"Data privacy becomes a crucial concern in the AI and big data era. Fully homomorphic encryption (FHE) is a promising data privacy protection technique where the entire computation is performed on encrypted data. However, the dramatic increase of the computation workload restrains the usage of FHE for the real-world applications. In this paper, we propose an FPFA accelerator design framework for CKKS-based HE. While the KeySwitch operations are the primary performance bottleneck of FHE computation, we propose a low latency design of KeySwitch module with reduced intra-operation data dependency. Compared with the state-of-the-art FPGA based key-switch implementation that is based on Verilog, the proposed high-level synthesis (HLS) based design reduces the operation latency by 40%. Furthermore, we propose an automated design space exploration framework which generates optimal encryption parameters and accelerators for a given application kernel and the target FPGA device. Experimental results for a set of real HE application kernels on different FPGA devices show that our HLS-based flexible design framework produces substantially better accelerator design compared with a fixed-parameter HE accelerator in terms of security, approximation error, and overall performance.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114334123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10