{"title":"SEDONUT: A Single Event Double Node Upset Tolerant SRAM for Terrestrial Applications","authors":"Govind Prasad, Bipin Chnadra Mandi, Maifuz Ali","doi":"10.1145/3651985","DOIUrl":"https://doi.org/10.1145/3651985","url":null,"abstract":"<p>The radiation and its effect on neighboring nodes are critical not only for space applications but also for terrestrial applications at modern lower technology nodes. This may cause SRAM failures due to single and multi-node upset. Hence, this paper proposes a 14T radiation-hardened-based SRAM cell to overcome soft errors for space and critical terrestrial applications. Simulation results show that the proposed cell can be resilient to any single event upset and single event double node upset at its storage nodes. This cell uses less power than others. The hold, read, and write stability increases compared to most considered cells. The higher critical charge of the proposed SRAM increases radiation resistance. Simulation results demonstrate that out of all compared SRAMs, only DNUSRM and proposed SRAM show 0% probability of logical flipping. Also, the other parameters like total critical charge, write stability, read stability, hold stability, area, power, sensitive area, write speed, and read speed of the proposed SRAM are improved by -19.1%, 5.22%, 25.7%, -5.46%, 22.5%, 50.6%, 60.0%, 17.91%, and 0.74% compared to DNUSRM SRAM. Hence, the better balance among the parameters makes the proposed cell more suitable for space and critical terrestrial applications. Finally, the post-layout and Monte Carlo simulation validate the efficiency of SRAMs.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ping-Xiang Chen, Dongjoo Seo, Changhoon Sung, Jongheum Park, Minchul Lee, Huaicheng Li, Matias Bjørling, Nikil Dutt
{"title":"ZoneTrace: A Zone Monitoring Tool for F2FS on ZNS SSDs","authors":"Ping-Xiang Chen, Dongjoo Seo, Changhoon Sung, Jongheum Park, Minchul Lee, Huaicheng Li, Matias Bjørling, Nikil Dutt","doi":"10.1145/3656172","DOIUrl":"https://doi.org/10.1145/3656172","url":null,"abstract":"<p>We present <monospace>ZoneTrace</monospace>, a runtime monitoring tool for the Flash-Friendly File System (F2FS) on Zoned Namespace (ZNS) SSDs. ZNS SSD organizes its storage into zones of sequential write access. Due to ZNS SSD’s sequential write nature, F2FS is a log-structured file system that has recently been adopted to support ZNS SSDs. To present the space management with the zone concept between F2FS and the underlying ZNS SSD, we developed <monospace>ZoneTrace</monospace>, a tool that enables users to visualize and analyze the space management of F2FS on ZNS SSDs. <monospace>ZoneTrace</monospace> utilizes the extended Berkeley Packet Filter (eBPF) to trace the updated segment bitmap in F2FS and visualize each zone space usage accordingly. Furthermore, <monospace>ZoneTrace</monospace> is able to analyze on file fragmentation in F2FS and provides users with informative fragmentation histogram to serve as an indicator of file fragmentation. Using <monospace>ZoneTrace</monospace>’s visualization, we are able to identify the current F2FS space management scheme’s inability to fully optimize space for streaming data recording in autonomous systems, which leads to serious file fragmentation on ZNS SSDs. Our evaluations show that <monospace>ZoneTrace</monospace> is lightweight and assists users in getting useful insights for effortless monitoring on F2FS with ZNS SSD with both synthetic and realistic workloads. We believe <monospace>ZoneTrace</monospace> can help users analyze F2FS with ease and open up space management research topics with F2FS on ZNS SSDs.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenyan Yan, Dongsheng Wei, Bin Fu, Renfa Li, Guoqi Xie
{"title":"A Mixed-Criticality Traffic Scheduler with Mitigating Congestion for CAN-to-TSN Gateway","authors":"Wenyan Yan, Dongsheng Wei, Bin Fu, Renfa Li, Guoqi Xie","doi":"10.1145/3656173","DOIUrl":"https://doi.org/10.1145/3656173","url":null,"abstract":"<p>The network architecture that Time-Sensitive Networking (TSN) is used as the backbone network and the Controller Area Network (CAN) serves as the intra-domain network is considered as the CAN-TSN interconnection network architecture, which has gained considerable attention within industrial embedded networks, such as spacecraft, intelligent automobiles, and factory automation. The architecture employs the CAN-TSN gateway as a central hub for transmitting and managing a significant volume of communications between the CAN domains and TSN. However, the CAN-TSN gateway faces a high congestion challenge due to the rapid growth in data volume, making it difficult to effectively support different time planning mechanisms provided by TSN. In this paper, we propose a two-stage mixed-criticality traffic scheduler. The scheduler in the first stage adopts a Message Optimization Algorithm (MOA) to aggregate multiple CAN messages into a single TSN message (including the aggregation of critical and non-critical CAN messages), which reduces the number of CAN messages requiring transmission. In the second stage, the scheduler proposes a Message Scheduling Optimization Algorithm (MSOA) to schedule critical TSN messages. This algorithm reassembles all the critical CAN messages (within the un-schedulable TSN messages) to generate new TSN messages for rescheduling. Experimental results show that our proposed scheduler effectively improves the acceptance ratio of critical and non-critical CAN messages and outperforms the state-of-the-art message scheduling method in terms of acceptance ratio while improving the bandwidth utilization and the number of schedule table entries. We further construct a hardware platform to evaluate the performance of MSOA. The consistency between practical results and theoretical results shows the effectiveness of MSOA.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140573636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incremental Concolic Testing of Register-Transfer Level Designs","authors":"Hasini Witharana, Aruna Jayasena, Prabhat Mishra","doi":"10.1145/3655621","DOIUrl":"https://doi.org/10.1145/3655621","url":null,"abstract":"Concolic testing is a scalable solution for automated generation of directed tests for validation of hardware designs. Unfortunately, concolic testing fails to cover complex corner cases such as hard-to-activate branches. In this paper, we propose an incremental concolic testing technique to cover hard-to-activate branches in register-transfer level (RTL) models. We show that a complex branch condition can be viewed as a sequence of easy-to-activate events. We map the branch coverage problem to the coverage of a sequence of events. We propose an efficient algorithm to cover the sequence of events using concolic testing. Specifically, the test generated to activate the current event is used as the starting point to activate the next event in the sequence. Experimental results demonstrate that our approach can be used to generate directed tests to cover complex corner cases in RTL models while state-of-the-art methods fail to activate them.","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140362228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aritra Bagchi, Dharamjeet, Ohm Rishabh, Manan Suri, Preeti Ranjan Panda
{"title":"POEM: Performance Optimization and Endurance Management for Non-volatile Caches","authors":"Aritra Bagchi, Dharamjeet, Ohm Rishabh, Manan Suri, Preeti Ranjan Panda","doi":"10.1145/3653452","DOIUrl":"https://doi.org/10.1145/3653452","url":null,"abstract":"<p>Non-volatile memories (NVMs) with their high storage density and ultra-low leakage power offer promising potential for redesigning the memory hierarchy in next-generation Multi-Processor Systems-on-Chip (MPSoCs). However, the adoption of NVMs in cache designs introduces challenges such as NVM write overheads and limited NVM endurance. The shared NVM cache in an MPSoC experiences <i>requests</i> from different processor cores and <i>responses</i> from the off-chip memory when the requested data is not present in the cache. Besides, upon evictions of dirty data from higher-level caches, the shared NVM cache experiences another source of write operations, known as <i>writebacks</i>. These sources of write operations: writebacks and responses, further exacerbate the contention for the shared bandwidth of the NVM cache, and create significant performance bottlenecks. Uncontrolled write operations can also affect the endurance of the NVM cache, posing a threat to cache lifetime and system reliability. Existing strategies often address either performance or cache endurance individually, leaving a gap for a holistic solution. This study introduces the Performance Optimization and Endurance Management (POEM) methodology, a novel approach that aggressively bypasses cache writebacks and responses to alleviate the NVM cache contention. Contrary to the existing bypass policies which do not pay adequate attention to the shared NVM cache contention, and focus too much on cache data reuse, POEM’s aggressive bypass significantly improves the overall system performance, even at the expense of data reuse. POEM also employs effective wear leveling to enhance the NVM cache endurance by careful redistribution of write operations across different cache lines. Across diverse workloads, POEM yields an average speedup of (34% ) over a naïve baseline and (28.8% ) over a state-of-the-art NVM cache bypass technique, while enhancing the cache endurance by (15% ) over the baseline. POEM also explores diverse design choices by exploiting a key policy parameter that assigns varying priorities to the two system-level objectives.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140311393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Yang, Qi Xu, Hao Geng, Song Chen, Bei Yu, Yi Kang
{"title":"Floorplanning with Edge-Aware Graph Attention Network and Hindsight Experience Replay","authors":"Bo Yang, Qi Xu, Hao Geng, Song Chen, Bei Yu, Yi Kang","doi":"10.1145/3653453","DOIUrl":"https://doi.org/10.1145/3653453","url":null,"abstract":"<p>In this paper, we focus on chip floorplanning, which aims to determine the location and orientation of circuit macros simultaneously, so that the chip area and wirelength are minimized. As the highest level of abstraction in hierarchical physical design, floorplanning bridges the gap between the system-level design and the physical synthesis, whose quality directly influences downstream placement and routing. To tackle chip floorplanning, we propose an end-to-end reinforcement learning (RL) methodology with a hindsight experience replay technique. An edge-aware graph attention network (EAGAT) is developed to effectively encode the macro and connection features of the netlist graph. Moreover, we build a hierarchical decoder architecture mainly consisting of transformer and attention pointer mechanism to output floorplan actions. Since the RL agent automatically extracts knowledge about the solution space, the previously learned policy can be quickly transferred to optimize new unseen netlists. Experimental results demonstrate that, compared with state-of-the-art floorplanners, the proposed end-to-end methodology significantly optimizes area and wirelength on public GSRC and MCNC benchmarks.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140204154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chunlin Li, Kun Jiang, Yong Zhang, Lincheng Jiang, Youlong Luo, Shaohua Wan
{"title":"Deep Reinforcement Learning-Based Mining Task Offloading Scheme for Intelligent Connected Vehicles in UAV-Aided MEC","authors":"Chunlin Li, Kun Jiang, Yong Zhang, Lincheng Jiang, Youlong Luo, Shaohua Wan","doi":"10.1145/3653451","DOIUrl":"https://doi.org/10.1145/3653451","url":null,"abstract":"<p>The convergence of unmanned aerial vehicle (UAV)-aided mobile edge computing (MEC) networks and blockchain transforms the existing mobile networking paradigm. However, in the temporary hotspot scenario for intelligent connected vehicles (ICVs) in UAV-aided MEC networks, deploying blockchain-based services and applications in vehicles is generally impossible due to its high computational resource and storage requirements. One possible solution is to offload part of all the computational tasks to MEC servers wherever possible. Unfortunately, due to the limited availability and high mobility of the vehicles, there is still lacking simple solutions that can support low-latency and higher reliability networking services for ICVs. In this paper, we study the task offloading problem of minimizing the total system latency and the optimal task offloading scheme, subject to constraints on the hover position coordinates of the UAV, the fixed bonuses, flexible transaction fees, transaction rates, mining difficulty, costs and battery energy consumption of the UAV. The problem is confirmed to be a challenging linear integer planning problem, we formulate the problem as a constrained Markov decision process (CMDP). Deep Reinforcement Learning (DRL) has excellently solved sequential decision-making problems in dynamic ICVs environment, therefore, we propose a novel distributed DRL-based P-D3QN approach by using Prioritized Experience Replay (PER) strategy and the dueling double deep Q-network (D3QN) algorithm to solve the optimal task offloading policy effectively. Finally, experiment results show that compared with the benchmark scheme, the P-D3QN algorithm can bring about 26.24% latency improvement and increase about 42.26% offloading utility.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140170357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongduo Liu, Yijian Qian, Youqiang Liang, Bin Zhang, Zhaohan Liu, Tao He, Wenqian Zhao, Jiangbo Lu, Bei Yu
{"title":"A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAs","authors":"Hongduo Liu, Yijian Qian, Youqiang Liang, Bin Zhang, Zhaohan Liu, Tao He, Wenqian Zhao, Jiangbo Lu, Bei Yu","doi":"10.1145/3652855","DOIUrl":"https://doi.org/10.1145/3652855","url":null,"abstract":"<p>In the digital era, the prevalence of low-quality images contrasts with the widespread use of high-definition displays, primarily due to low-resolution cameras and compression technologies. Image super-resolution (SR) techniques, particularly those leveraging deep learning, aim to enhance these images for high-definition presentation. However, real-time execution of deep neural network (DNN)-based SR methods at the edge poses challenges due to their high computational and storage requirements. To address this, field-programmable gate arrays (FPGAs) have emerged as a promising platform, offering flexibility, programmability, and adaptability to evolving models. Previous FPGA-based SR solutions have focused on reducing computational and memory costs through aggressive simplification techniques, often sacrificing the quality of the reconstructed images. This paper introduces a novel SR network specifically designed for edge applications, which maintains reconstruction performance while managing computation costs effectively. Additionally, we propose an architectural design that enables the real-time and end-to-end inference of the proposed SR network on embedded FPGAs. Our key contributions include a tailored SR algorithm optimized for embedded FPGAs, a DSP-enhanced design that achieves a significant four-fold speedup, a novel scalable cache strategy for handling large feature maps, optimization of DSP cascade consumption, and a constraint optimization approach for resource allocation. Experimental results demonstrate that our FPGA-specific accelerator surpasses existing solutions, delivering superior throughput, energy efficiency, and image quality.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140151320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative Analysis of Dynamic Power Consumption of Parallel Prefix Adder","authors":"Ireneusz Brzozowski","doi":"10.1145/3651984","DOIUrl":"https://doi.org/10.1145/3651984","url":null,"abstract":"<p>The Newcomb-Benford law is the law, also known as Benford's law, of anomalous numbers stating that in many real-life numerical datasets, including physical and statistical ones, numbers have small initial digit. Numbers irregularity observed in nature leads to the question, is the arithmetical-logical unit, responsible for performing calculations in computers optimal? Are there other architectures, not as regular as commonly used Parallel Prefix Adders that can perform better, especially when operating on the datasets that are not purely random, but irregular,? In this article, structures of propagate-generate tree are compared including regular and irregular configurations – various structures are examined: regular, irregular, with gray cells only, with both gray and black and with higher valency cells. Performance is evaluated in terms of energy consumption. The evaluation was performed using the extended power model of static CMOS gates. The model is based on changes of vectors, naturally taking into account spatio-temporal correlations. The energy parameters of the designed cells were calculated on the basis of electrical (Spice) simulation. Designs and simulations were done in Cadence environment, calculations of the power dissipation were performed in Matlab. The results clearly show that there are PPA structures that perform much better for a specific type of numerical data. Negligent design can lead to an increase greater than two times of power consumption. The novel architectures of PPA described in this work might find practical applications in specialized adders dealing with numerical datasets, such as, for example, sine functions commonly used in digital signal processing.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140128363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Moshiur Rahman, Jim Geist, Daniel Xing, Yuntao Liu, Ankur Srivastava, Travis Meade, Yier Jin, Swarup Bhunia
{"title":"Security Evaluation of State Space Obfuscation of Hardware IP through a Red Team – Blue Team Practice","authors":"Md Moshiur Rahman, Jim Geist, Daniel Xing, Yuntao Liu, Ankur Srivastava, Travis Meade, Yier Jin, Swarup Bhunia","doi":"10.1145/3640461","DOIUrl":"https://doi.org/10.1145/3640461","url":null,"abstract":"<p>Due to the inclination towards a fab-less model of integrated circuit (IC) manufacturing, several untrusted entities get white-box access to the proprietary intellectual property (IP) blocks from diverse vendors. To this end, the untrusted entities pose security-breach threats in the form of piracy, cloning, and reverse engineering, sometimes threatening national security. Hardware obfuscation is a prominent countermeasure against such issues. Obfuscation allows for preventing the usage of the IP blocks without authorization from the IP owners. Due to finite state machine (FSM) transformation-based hardware obfuscation, the design’s FSM gets transformed to make it difficult for an attacker to reverse engineer the design. A secret key needs to be applied to make the FSM functional thus preventing the usage of the IP for unintended purposes. Although several hardware obfuscation techniques have been proposed, due to the inability to analyze the techniques from the attackers’ standpoint, numerous vulnerabilities inherent to the obfuscation methods go undetected unless a true adversary discovers them. In this paper, we present a collaborative approach between two entities - one acting as an attacker or <i>red team</i> and another as a defender or <i>blue team</i>, the first systematic approach to replicate the real attacker-defender scenario in the hardware security domain, which in return strengthens the FSM transformation-based obfuscation technique. The <i>blue team</i> transforms the underlying FSM of a gate-level netlist using state space obfuscation. The <i>red team</i> plays the role of an adversary or evaluator and tries to unlock the design by extracting the unlocking key or recovering the obfuscation circuitries. As the key outcome of this red team - blue team effort, a robust state space obfuscation methodology is evolved showing security promises.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140037170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}