ACM Transactions on Design Automation of Electronic Systems最新文献

A Single Bitline Highly Stable, Low Power With High Speed Half-Select Disturb Free 11T SRAM Cell 单比特线高稳定、低功耗、高速半选择无干扰 11T SRAM 单元

IF 1.4 4区计算机科学

ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-06-19 DOI: 10.1145/3653675

Lokesh Soni, Neeta Pandey

{"title":"A Single Bitline Highly Stable, Low Power With High Speed Half-Select Disturb Free 11T SRAM Cell","authors":"Lokesh Soni, Neeta Pandey","doi":"10.1145/3653675","DOIUrl":"https://doi.org/10.1145/3653675","url":null,"abstract":"A half-select disturb-free 11T (HF11T) static random access memory (SRAM) cell with low power, better stability and high speed is presented in this paper. The proposed SRAM cell works well with bit-interleaving design, which enhances soft-error immunity. A comparison of the proposed HF11T cell with other cutting-edge designs such as single-ended HS free 11T (SEHF11T), a shared-pass-gate 11T (SPG11T), data-dependent stack PMOS switching 10T (DSPS10T), a single-ended half-selected robust 12T (HSR12T), and 11T SRAM cells has been made. It exhibits 4.85 × /9.19 × less read delay (TRA) and write delay (TWA), respectively as compared to other considered SRAM cells. It achieves 1.07 × /1.02 × better read and write stability, respectively than the considered SRAM cells. It shows maximum reduction of 1.68 × /4.58 × /94.72 × /9 × /145 × leakage power, read power, write power consumption, read power delay product (PDP) and write PDP respectively, than the considered SRAM cells. In addition, the proposed HF11T cell achieves 10.14 × higher Ion/Ioff ratio than the other compared cells. These improvements come with a trade-off, resulting in 1.13 × more TRA compared to SPG11T. The simulation is performed with Cadence Virtuoso 45nm CMOS technology at supply voltage (VDD) of 0.6 V.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"59 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing Hyperdimensional Computing Based on Trainable Encoding and Adaptive Training for Efficient and Accurate Learning 推进基于可训练编码和自适应训练的超维计算，实现高效准确学习

IF 1.4 4区计算机科学

ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-06-04 DOI: 10.1145/3665891

Jiseung Kim, Hyunsei Lee, Mohsen Imani, Yeseong Kim

{"title":"Advancing Hyperdimensional Computing Based on Trainable Encoding and Adaptive Training for Efficient and Accurate Learning","authors":"Jiseung Kim, Hyunsei Lee, Mohsen Imani, Yeseong Kim","doi":"10.1145/3665891","DOIUrl":"https://doi.org/10.1145/3665891","url":null,"abstract":"Hyperdimensional computing (HDC) is a computing paradigm inspired by the mechanisms of human memory, characterizing data through high-dimensional vector representations, known as hypervectors. Recent advancements in HDC have explored its potential as a learning model, leveraging its straightforward arithmetic and high efficiency. The traditional HDC frameworks are hampered by two primary static elements: randomly generated encoders and fixed learning rates. These static components significantly limit model adaptability and accuracy. The static, randomly generated encoders, while ensuring high-dimensional representation, fail to adapt to evolving data relationships, thereby constraining the model’s ability to accurately capture and learn from complex patterns. Similarly, the fixed nature of the learning rate does not account for the varying needs of the training process over time, hindering efficient convergence and optimal performance. This paper introduces (mathsf {TrainableHD} ), a novel HDC framework that enables dynamic training of the randomly generated encoder depending on the feedback of the learning data, thereby addressing the static nature of conventional HDC encoders. (mathsf {TrainableHD} ) also enhances the training performance by incorporating adaptive optimizer algorithms in learning the hypervectors. We further refine (mathsf {TrainableHD} ) with effective quantization to enhance efficiency, allowing the execution of the inference phase in low-precision accelerators. Our evaluations demonstrate that (mathsf {TrainableHD} ) significantly improves HDC accuracy by up to 27.99% (averaging 7.02%) without additional computational costs during inference, achieving a performance level comparable to state-of-the-art deep learning models. Furthermore, (mathsf {TrainableHD} ) is optimized for execution speed and energy efficiency. Compared to deep learning on a low-power GPU platform like NVIDIA Jetson Xavier, (mathsf {TrainableHD} ) is 56.4 times faster and 73 times more energy efficient. This efficiency is further augmented through the use of Encoder Interval Training (EIT) and adaptive optimizer algorithms, enhancing the training process without compromising the model’s accuracy.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"21 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141253460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators 面向机器学习加速器的基于 ML 的开源全栈优化框架

IF 1.4 4区计算机科学

ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-05-11 DOI: 10.1145/3664652

Hadi Esmaeilzadeh, Soroush Ghodrati, Andrew Kahng, Joon Kyung Kim, Sean Kinzer, Sayak Kundu, Rohan Mahapatra, Susmita Dey Manasi, Sachin Sapatnekar, Zhiang Wang, Ziqing Zeng

{"title":"An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators","authors":"Hadi Esmaeilzadeh, Soroush Ghodrati, Andrew Kahng, Joon Kyung Kim, Sean Kinzer, Sayak Kundu, Rohan Mahapatra, Susmita Dey Manasi, Sachin Sapatnekar, Zhiang Wang, Ziqing Zeng","doi":"10.1145/3664652","DOIUrl":"https://doi.org/10.1145/3664652","url":null,"abstract":"Parameterizable machine learning (ML) accelerators are the product of recent breakthroughs in ML. To fully enable their design space exploration (DSE), we propose a physical-design-driven, learning-based prediction framework for hardware-accelerated deep neural network (DNN) and non-DNN ML algorithms. It adopts a unified approach that combines power, performance, and area (PPA) analysis with frontend performance simulation, thereby achieving a realistic estimation of both backend PPA and system metrics such as runtime and energy. In addition, our framework includes a fully automated DSE technique, which optimizes backend and system metrics through an automated search of architectural and backend parameters. Experimental studies show that our approach consistently predicts backend PPA and system metrics with an average 7% or less prediction error for the ASIC implementation of two deep learning accelerator platforms, VTA and VeriGOOD-ML, in both a commercial 12 nm process and a research-oriented 45 nm process.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"2674 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140929019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data Pruning-enabled High Performance and Reliable Graph Neural Network Training on ReRAM-based Processing-in-Memory Accelerators 在基于 ReRAM 的内存处理加速器上进行数据剪枝，实现高性能、可靠的图神经网络训练

IF 1.4 4区计算机科学

ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-05-03 DOI: 10.1145/3656171

Chukwufumnanya Ogbogu, Biresh K. Joardar, Krishnendu Chakrabarty, Jana Doppa, Partha Pratim Pande

{"title":"Data Pruning-enabled High Performance and Reliable Graph Neural Network Training on ReRAM-based Processing-in-Memory Accelerators","authors":"Chukwufumnanya Ogbogu, Biresh K. Joardar, Krishnendu Chakrabarty, Jana Doppa, Partha Pratim Pande","doi":"10.1145/3656171","DOIUrl":"https://doi.org/10.1145/3656171","url":null,"abstract":"Graph Neural Networks (GNNs) have achieved remarkable accuracy in cognitive tasks such as predictive analytics on graph-structured data. Hence, they have become very popular in diverse real-world applications. However, GNN training with large real-world graph datasets in edge-computing scenarios is both memory- and compute-intensive. Traditional computing platforms such as CPUs and GPUs do not provide the energy efficiency and low latency required in edge intelligence applications due to their limited memory bandwidth. Resistive random-access memory (ReRAM)-based processing-in-memory (PIM) architectures have been proposed as suitable candidates for accelerating AI applications at the edge, including GNN training. However, ReRAM-based PIM architectures suffer from low reliability due to their limited endurance, and low performance when they are used for GNN training in real-world scenarios with large graphs. In this work, we propose a learning-for-data-pruning framework, which leverages a trained Binary Graph Classifier (BGC) to reduce the size of the input data graph by pruning subgraphs early in the training process to accelerate the GNN training process on ReRAM-based architectures. The proposed light-weight BGC model reduces the amount of redundant information in input graph(s) to speed up the overall training process, improves the reliability of the ReRAM-based PIM accelerator, and reduces the overall training cost. This enables fast, energy-efficient, and reliable GNN training on ReRAM-based architectures. Our experimental results demonstrate that using this learning for data pruning framework, we can accelerate GNN training and improve the reliability of ReRAM-based PIM architectures by up to 1.6 ×, and reduce the overall training cost by 100 × compared to state-of-the-art data pruning techniques.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"6 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140827622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HLS-IRT: Hardware Trojan Insertion through Modification of Intermediate Representation During High-Level Synthesis HLS-IRT：在高层合成过程中通过修改中间表示法插入硬件木马

IF 1.4 4区计算机科学

ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-05-03 DOI: 10.1145/3663477

Rijoy Mukherjee, Archisman Ghosh, Rajat Subhra Chakraborty

{"title":"HLS-IRT: Hardware Trojan Insertion through Modification of Intermediate Representation During High-Level Synthesis","authors":"Rijoy Mukherjee, Archisman Ghosh, Rajat Subhra Chakraborty","doi":"10.1145/3663477","DOIUrl":"https://doi.org/10.1145/3663477","url":null,"abstract":"Modern integrated circuit (IC) design incorporates the usage of proprietary computer-aided design (CAD) software and integration of third-party hardware intellectual property (IP) cores. Subsequently, the fabrication process for the design takes place in untrustworthy offshore foundries that raises concerns regarding security and reliability. Hardware Trojans (HTs) are difficult to detect malicious modifications to IC that constitute a major threat, which if undetected prior to deployment, can lead to catastrophic functional failures or the unauthorized leakage of confidential information. Apart from the risks posed by rogue human agents, recent studies have shown that high-level synthesis (HLS) CAD software can serve as a potent attack vector for inserting Hardware Trojans (HTs). In this paper, we introduce a novel automated attack vector, which we term “HLS-IRT”, by inserting HT in the register transfer logic (RTL) description of circuits generated during a HLS based IC design flow, by directly modifying the compiler-generated intermediate representation (IR) corresponding to the design. We demonstrate the attack using a design and implementation flow based on the open-source Bambu HLS software and Xilinx FPGA, on several hardware accelerators spanning different application domains. Our results show that the resulting HTs are surreptitious and effective, while incurring minimal design overhead. We also propose a novel detection scheme for HLS-IRT, since existing techniques are found to be inadequate to detect the proposed HTs.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"45 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140827826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepOTF: Learning Equations-constrained Prediction for Electromagnetic Behavior DeepOTF：学习受方程约束的电磁行为预测

IF 1.4 4区计算机科学

ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-05-01 DOI: 10.1145/3663476

Peng Xu, Siyuan XU, Tinghuan Chen, Guojin Chen, Tsungyi Ho, Bei Yu

{"title":"DeepOTF: Learning Equations-constrained Prediction for Electromagnetic Behavior","authors":"Peng Xu, Siyuan XU, Tinghuan Chen, Guojin Chen, Tsungyi Ho, Bei Yu","doi":"10.1145/3663476","DOIUrl":"https://doi.org/10.1145/3663476","url":null,"abstract":"High-quality passive devices are becoming increasingly important for the development of mobile devices and telecommunications, but obtaining such devices through simulation and analysis of electromagnetic (EM) behavior is time-consuming. To address this challenge, artificial neural network (ANN) models have emerged as an effective tool for modeling EM behavior, with NeuroTF being a representative example. However, these models are limited by the specific form of the transfer function, leading to discontinuity issues and high sensitivities. Moreover, previous methods have overlooked the physical relationship between distributed parameters, resulting in unacceptable numeric errors in the conversion results. To overcome these limitations, we propose two different neural network architectures: DeepOTF and ComplexTF. DeepOTF is a data-driven deep operator network for automatically learning feasible transfer functions for different geometric parameters. ComplexTF utilizes complex-valued neural networks to fit feasible transfer functions for different geometric parameters in the complex domain while maintaining causality and passivity. Our approach also employs an Equations-constraint Learning scheme to ensure the strict consistency of predictions and a dynamic weighting strategy to balance optimization objectives. The experimental results demonstrate that our framework shows superior performance than baseline methods, achieving up to 1700 × higher accuracy. ","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"53 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140827670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semi-Permanent Stuck-At Fault injection attacks on Elephant and GIFT lightweight ciphers 针对大象和 GIFT 轻型密码的半永久性卡顿故障注入攻击

IF 1.4 4区计算机科学

ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-04-29 DOI: 10.1145/3662734

Priyanka Joshi, Bodhisatwa Mazumdar

{"title":"Semi-Permanent Stuck-At Fault injection attacks on Elephant and GIFT lightweight ciphers","authors":"Priyanka Joshi, Bodhisatwa Mazumdar","doi":"10.1145/3662734","DOIUrl":"https://doi.org/10.1145/3662734","url":null,"abstract":"Fault attacks pose a potent threat to modern cryptographic implementations, particularly those used in physically approachable embedded devices in IoT environments. Information security in such resource-constrained devices is ensured using lightweight ciphers, where combinational circuit implementations of SBox are preferable over look-up tables (LUT) as they are more efficient regarding area, power, and memory requirements. Most existing fault analysis techniques focus on fault injection in memory cells and registers. Recently, a novel fault model and analysis technique, namely Semi-Permanent Stuck-At (SPSA) fault analysis, has been proposed to evaluate the security of ciphers with combinational circuit implementation of Substitution layer elements, SBox. In this work, we propose optimized techniques to recover the key in a minimum number of ciphertexts in such implementations of lightweight ciphers. Based on the proposed techniques, a key recovery attack on the NIST lightweight cryptography (NIST-LWC) standardization process finalist, <monospace>Elephant</monospace> AEAD, has been proposed. The proposed key recovery attack is validated on two versions of <monospace>Elephant</monospace> cipher. The proposed fault analysis approach recovered the secret key within 85 − 240 ciphertexts, calculated over 1000 attack instances. To the best of our knowledge, this is the first work on fault analysis attacks on the <monospace>Elephant</monospace> scheme. Furthermore, an optimized combinational circuit implementation of Spongent SBox (SBox used in <monospace>Elephant</monospace> cipher) is proposed, having a smaller gate count than the optimized implementation reported in the literature. The proposed fault analysis techniques are validated on primary and optimized versions of Spongent SBox through Verilog simulations. Further, we pinpoint SPSA hotspots in the lightweight <monospace>GIFT</monospace> cipher SBox architecture. We observe that <monospace>GIFT</monospace> SBox exhibits resilience towards the proposed SPSA fault analysis technique under the single fault adversarial model. However, eight SPSA fault patterns reduce the nonlinearity of the SBox to zero, rendering it vulnerable to linear cryptanalysis. Conclusively, SPSA faults may adversely affect the cryptographic properties of an SBox, thereby leading to trivial key recovery. The <monospace>GIFT</monospace> cipher is used as an example to focus on two aspects: i) its SBox construction is resilient to the proposed SPSA analysis and therefore characterizing such constructions for SPSA resilience and, ii) an SBox even though resilient to the proposed SPSA analysis, may exhibit vulnerabilities towards other classical analysis techniques when subjected to SPSA faults. Our work reports new vulnerabilities in fault analysis in the combinational circuit implementations of cryptographic protocols.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"1 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140810874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

gem5-NVDLA: A Simulation Framework for Compiling, Scheduling and Architecture Evaluation on AI System-on-Chips gem5-NVDLA：人工智能片上系统的编译、调度和架构评估仿真框架

IF 1.4 4区计算机科学

ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-04-29 DOI: 10.1145/3661997

Chengtao Lai, Wei Zhang

{"title":"gem5-NVDLA: A Simulation Framework for Compiling, Scheduling and Architecture Evaluation on AI System-on-Chips","authors":"Chengtao Lai, Wei Zhang","doi":"10.1145/3661997","DOIUrl":"https://doi.org/10.1145/3661997","url":null,"abstract":"Recent years have seen an increasing trend in designing AI accelerators together with the rest of the system, including CPUs and memory hierarchy. This trend calls for high-quality simulators or analytical models that enable such kind of co-exploration. Currently, the majority of such exploration is supported by AI accelerator analytical models. But such models usually overlook the non-trivial impact of congestion of shared resources, non-ideal hardware utilization and non-zero CPU scheduler overhead, which could only be modeled by cycle-level simulators. However, most simulators with full-stack toolchains are proprietary to corporations, and the few open-source simulators are suffering from either weak compilers or limited space of modeling. This framework resolves these issues by proposing a compilation and simulation flow to run arbitrary Caffe neural network models on the NVIDIA Deep Learning Accelerator (NVDLA) with gem5, a cycle-level simulator, and by adding more building blocks including scratchpad allocation, multi-accelerator scheduling, tensor-level prefetching mechanisms and a DMA-aided embedded buffer to map workload to multiple NVDLAs. The proposed framework has been tested and verified on a set of convolution neural networks, showcasing the capability of modeling complex buffer management strategies, scheduling policies and hardware architectures. As a case study of this framework, we demonstrate the importance of adopting different buffering strategies for activation and weight tensors in AI accelerators to acquire remarkable speedup.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"31 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140810870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced Watermarking for Paper-Based Digital Microfluidic Biochips 纸质数字微流控生物芯片的增强型水印技术

IF 1.4 4区计算机科学

ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-04-29 DOI: 10.1145/3661309

Jian-De Li, Sying-Jyan Wang, Katherine Shu-Min Li, Tsung-Yi Ho

引用次数: 0

Enhanced Compiler Technology for Software-based Hardware Fault Detection 基于软件的硬件故障检测的增强型编译器技术

IF 1.4 4区计算机科学

ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-04-22 DOI: 10.1145/3660524

Davide Baroffio, Federico Reghenzani, William Fornaciari

引用次数: 0