{"title":"Untrusted Code Compartmentalization for Bare Metal Embedded Devices","authors":"Liam Tyler;Ivan De Oliveira Nunes","doi":"10.1109/TCAD.2024.3444691","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3444691","url":null,"abstract":"Micro-controller units (MCUs) implement the de facto interface between the physical and digital worlds. As a consequence, they appear in a variety of sensing/actuation applications from smart personal spaces to complex industrial control systems and safety-critical medical equipment. While many of these devices perform safety- and time-critical tasks, they often lack support for security features compatible with their importance to overall system functions. This lack of architectural support leaves them vulnerable to run-time attacks that can remotely alter their intended behavior, with potentially catastrophic consequences. In particular, we note that, MCU software often includes untrusted third-party libraries (some of them closed-source) that are blindly used within MCU programs, without proper isolation from the rest of the system. In turn, a single vulnerability (or intentional backdoor) in one such third-party software can often compromise the entire MCU software state. In this article, we tackle this problem by proposing, demonstrating security, and formally verifying the implementation of UCCA: an \u0000<underline>U</u>\u0000ntrusted \u0000<underline>C</u>\u0000ode \u0000<underline>C</u>\u0000ompartment \u0000<underline>A</u>\u0000rchitecture. UCCA provides flexible hardware-enforced isolation of untrusted code sections (e.g., third-party software modules) in resource-constrained and time-critical MCUs. To demonstrate UCCA’s practicality, we implement an open-source version of the design on a real resource-constrained MCU: the well-known TI MSP430. Our evaluation shows that UCCA incurs little overhead and is affordable even to lowest-end MCUs, requiring significantly less overhead and assumptions than the prior related work.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3419-3430"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Dataflow-Aware Network-on-Interposer for CNN Inferencing in the Presence of Defective Chiplets","authors":"Harsh Sharma;Umit Ogras;Ananth Kalyanraman;Partha Pratim Pande","doi":"10.1109/TCAD.2024.3447210","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447210","url":null,"abstract":"The emergence of 2.5D chiplet platforms provides a new avenue for compact scale-out implementations of deep learning (DL) workloads (WLs). Integrating multiple small chiplets using a network-on-interposer (NoI) offers not only significant cost reduction and higher manufacturing yield than 2-D ICs but also better energy efficiency and performance. However, defects in chiplets may compromise performance since they restrict the computing capability. Therefore, carefully designed chiplet and NoI link placement, and task mapping schemes, in presence of defects, are necessary. In this article, we propose a defect-aware NoI design approach using a custom-defined space-filling curve (SFC) for efficient execution of mixed WLs of convolutional neural network (CNN) inference tasks. We demonstrate that the k-ary n-cube-based NoI topologies can be degenerated into SFC-based counterparts, which we refer to as SFCed NoI topologies. They enable high performance and energy efficiency with lower fabrication costs over their parent k-ary n-cube counterparts. The SFCed approach helps us to extract high performance from an inherently defective system. We demonstrate that SFCed design achieves up to \u0000<inline-formula> <tex-math>$2.3times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$3.5times $ </tex-math></inline-formula>\u0000 reduction in latency and energy, respectively, compared to parent NoI architectures while executing diverse DL WLs.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4190-4201"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10745841","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ECG: Augmenting Embedded Operating System Fuzzing via LLM-Based Corpus Generation","authors":"Qiang Zhang;Yuheng Shen;Jianzhong Liu;Yiru Xu;Heyuan Shi;Yu Jiang;Wanli Chang","doi":"10.1109/TCAD.2024.3447220","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447220","url":null,"abstract":"Embedded operating systems (Embedded OSs) power much of our critical infrastructure but are, in general, much less tested for bugs than general-purpose operating systems. Fuzzing Embedded OSs encounter significant roadblocks due to much less documented specifications, an inherent ineffectiveness in generating high-quality payloads. In this article, we propose ECG, an Embedded OS fuzzer empowered by large language models (LLMs) to sufficiently mitigate the aforementioned issues. ECG approaches fuzzing Embedded OS by automatically generating input specifications based on readily available source code and documentation, instrumenting and intercepting execution behavior for directional guidance information, and generating inputs with payloads according to the pregenerated input specifications and directional hints provided from previous runs. These methods are empowered by using an interactive refinement method to extract the most from LLMs while using established parsing checkers to validate the outputs. Our evaluation results demonstrate that ECG uncovered 32 new vulnerabilities across three popular open-source Embedded OS (RT-Linux, RaspiOS, and OpenWrt) and detected ten bugs in a commercial Embedded OS running on an actual device. Moreover, compared to Syzkaller, Moonshine, KernelGPT, Rtkaller, and DRLF, ECG has achieved additional kernel code coverage improvements of 23.20%, 19.46%, 10.96%, 15.47%, and 11.05%, respectively, with an overall average improvement of 16.02%. These results underscore ECG’s enhanced capability in uncovering vulnerabilities, thus contributing to the overall robustness and security of the Embedded OS.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4238-4249"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jinyu Zhan;Suidi Peng;Wei Jiang;Xiang Wang;Jiarui Liu
{"title":"Detecting Spoofed Noisy Speeches via Activation-Based Residual Blocks for Embedded Systems","authors":"Jinyu Zhan;Suidi Peng;Wei Jiang;Xiang Wang;Jiarui Liu","doi":"10.1109/TCAD.2024.3437331","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3437331","url":null,"abstract":"Spoofed noisy speeches seriously threaten the speech-based embedded systems, such as smartphones and intelligent assistants. Consequently, we present an anti-spoofing detection model with activation-based residual blocks to identify spoofed noisy speeches with the requirements of high accuracy and low time overhead. Through theoretic analysis of noise propagation on shortcut connections of traditional residual blocks, we observe that different activation functions can help reducing the influence of noise under certain situations. Then, we propose a feature-aware activation function to weaken the influence of noise and enhance the anti-spoofing features on shortcut connections, in which a fine-grained processing is designed to remove noise and strengthen significant features. We also propose a variance-increasing-based optimization algorithm to find the optimal hyperparameters of the feature-aware activation function. Benchmark-based experiments demonstrate that the proposed method can reduce the average equal error rate of anti-spoofing detection from 21.72% to 4.51% and improve the accuracy by up to 37.06% and save up to 91.26% of time overhead on Jetson AGX Xavier compared with ten state-of-the-art methods.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3985-3996"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NDPGNN: A Near-Data Processing Architecture for GNN Training and Inference Acceleration","authors":"Haoyang Wang;Shengbing Zhang;Xiaoya Fan;Zhao Yang;Meng Zhang","doi":"10.1109/TCAD.2024.3446871","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3446871","url":null,"abstract":"Graph neural networks (GNNs) require a large number of fine-grained memory accesses, which results in inefficient use of bandwidth resources. In this article, we introduce a near-data processing architecture tailored for GNN acceleration, named NDPGNN. NDPGNN provides different operating modes to meet the acceleration needs of various GNN frameworks while ensuring the configurability and scalability of the system. NDPGNN takes advantage of data locality characteristics to repeatedly distribute and utilize data, thereby reducing memory access requirements, and further improving memory access efficiency by combining a subgraph sparse node scheduling strategy with intermediate result reuse. We use data packaging to provide a higher effective data ratio for long-distance data transmission, thereby improving the utilization of the system’s limited bandwidth resources. Compared with the previous method, NDPGNN brings 5.68 times improvement in system performance while reducing energy consumption overhead by 8.49 times.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3997-4008"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeferson González-Gómez;Mohammed Bakr Sikal;Heba Khdr;Lars Bauer;Jörg Henkel
{"title":"Balancing Security and Efficiency: System-Informed Mitigation of Power-Based Covert Channels","authors":"Jeferson González-Gómez;Mohammed Bakr Sikal;Heba Khdr;Lars Bauer;Jörg Henkel","doi":"10.1109/TCAD.2024.3438999","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3438999","url":null,"abstract":"As the digital landscape continues to evolve, the security of computing systems has become a critical concern. Power-based covert channels (e.g., thermal covert channel s (TCCs)), a form of communication that exploits the system resources to transmit information in a hidden or unintended manner, have been recently studied as an effective mechanism to leak information between malicious entities via the modulation of CPU power. To this end, dynamic voltage and frequency scaling (DVFS) has been widely used as a countermeasure to mitigate TCCs by directly affecting the communication between the actors. Although this technique has proven effective in neutralizing such attacks, it introduces significant performance and energy penalties, that are particularly detrimental to energy-constrained embedded systems. In this article, we propose different system-informed countermeasures to power-based covert channels from the heuristic and machine learning (ML) domains. Our proposed techniques leverage task migration and DVFS to jointly mitigate the channels and maximize energy efficiency. Our extensive experimental evaluation on two commercial platforms: 1) the NVIDIA Jetson TX2 and 2) Jetson Orin shows that our approach significantly improves the overall energy efficiency of the system compared to the state-of-the-art solution while nullifying the attack at all times.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3395-3406"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shanmukha Mangadahalli Siddaramu;Ali Nezhadi;Mahta Mayahinia;Seyedehmaryam Ghasemi;Mehdi B. Tahoori
{"title":"Hardware and Software Co-Design for Optimized Decoding Schemes and Application Mapping in NVM Compute-in-Memory Architectures","authors":"Shanmukha Mangadahalli Siddaramu;Ali Nezhadi;Mahta Mayahinia;Seyedehmaryam Ghasemi;Mehdi B. Tahoori","doi":"10.1109/TCAD.2024.3447216","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447216","url":null,"abstract":"The computation-in nonvolatile memory (NVM-CiM) approach addresses the growing computational demands and the memory-wall problem faced by traditional processor-centric architectures. Computation-in-memory (CiM) capitalizes on the parallel nature of memory arrays enabling effective computation through multirow memristor reading and sensing. In this context, the conventional design of memory decoders needs to be accordingly modified for efficient multirow activation and parallel data processing. This article presents the design and optimization of address decoders for NVM-CiM system architectures, employing a cross-layer co-optimization approach that integrates circuit and architecture design with application requirements. Our methodology starts at the circuit level, examining various decoder designs, including cascaded, hierarchical, latched, and hybrid models. An in-depth application-level characterization follows, utilizing an extended NVM-CiM-capable gem5 simulator to assess the impact of these decoders on the mapping of CiM-friendly applications and the resulting system performance, particularly in facilitating rapid and efficient activation of multirow memory configurations. This holistic analysis allows us to identify the bottlenecks and requirements from the application side and adjust the design of the decoder accordingly. Our analysis reveals that Hybrid Decoders significantly decrease latency and power consumption compared to other decoder designs within NVM-CiM systems. This highlights the crucial role of the decoder’s row selection flexibility, reducing additional system-level data movement even at the expense of its performance, can substantially improve the overall efficiency of NVM-CiM systems.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3744-3755"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao Hu;Menglong Cui;Mingsong Lv;Tao Yang;Yiyang Zhou;Qingxu Deng;Chun Jason Xue;Nan Guan
{"title":"Ghostbuster: A Software Approach for Reducing Ghosting Effect on Electrophoretic Displays","authors":"Tao Hu;Menglong Cui;Mingsong Lv;Tao Yang;Yiyang Zhou;Qingxu Deng;Chun Jason Xue;Nan Guan","doi":"10.1109/TCAD.2024.3446711","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3446711","url":null,"abstract":"Electrophoretic displays (EPDs), also known as e-paper, offer a paper-like visual experience by reflecting ambient light, making them distinct from traditional LCD or LED displays. They are favored for their eye comfort, energy efficiency, and material flexibility, which make them appealing for a wide range of embedded devices, including eReaders, smartphones, tablets, and wearables. However, EPDs face a significant challenge: the necessity for a fast refresh rate (to maintain an acceptable display performance) introduces a pronounced ghosting effect. This effect results in noticeable color discrepancies between the displayed and source images, harming the user experience and hindering EPDs’ broader application in devices requiring dynamic content display. This article proposes a software-based solution to address the ghosting issue in EPDs. Our approach involves developing analytical models to predict the occurrence of ghosting effects and adjusting the source images to counteract the anticipated color deviations, which can reduce the perceivable ghosts on the display. Experimental evaluation conducted on real-world EPDs validates the effectiveness of our proposed approach in reducing the ghosting effect.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3780-3791"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-Performance Remote Data Persisting for Key-Value Stores via Persistent Memory Region","authors":"Yongping Luo;Peiquan Jin;Xiaoliang Wang;Zhaole Chu;Kuankuan Guo;Jinhui Guo","doi":"10.1109/TCAD.2024.3442992","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3442992","url":null,"abstract":"Key-value stores (KVStores), such as LevelDB and Redis, have been widely used in real-world production environments. To guarantee data durability and availability, traditional KVStores suffer from high write latency, mainly caused by the long network and data-persisting time. To solve this problem, this article presents a novel data-persisting path for KVStores, allowing remote clients to persist data to the KVStore server with \u0000<inline-formula> <tex-math>$mu s$ </tex-math></inline-formula>\u0000-level latency. The novelty of this study is threefold. First, we propose PMRDirect, which utilizes a persistent memory region (PMR) in the NVM express standard to construct a direct data-persisting path from the RDMA networking card (NIC) to the PMR region inside an SSD. Second, to showcase PMRDirect in KVStores, we developed a new accessing stack called PMRAccess, enabling remote clients to access existing KVStores and providing durability for each write request. Specifically, we present a low-latency RDMA-based messaging mode and a chunk-based PMR management in PMRAccess to reduce write latency and improve system throughput. Finally, we conducted extensive experiments to evaluate the performance of our proposals. We first compared PMRDirect with a few remote data-persisting paths to show its effectiveness. Then, we evaluated PMRAccess upon two KVStores, including LibCuckoo (an in-memory KVStore) and LevelDB (an in-storage KVStore). The results showed that PMRAccess outperformed the SSD-based accessing stack by up to \u0000<inline-formula> <tex-math>$6.1times $ </tex-math></inline-formula>\u0000 in write throughput and \u0000<inline-formula> <tex-math>$36times $ </tex-math></inline-formula>\u0000 in write tail latency, and it achieved \u0000<inline-formula> <tex-math>$1.7times $ </tex-math></inline-formula>\u0000 higher write throughput and \u0000<inline-formula> <tex-math>$0.59times $ </tex-math></inline-formula>\u0000 lower write tail latency over the PMEM-based accessing stack. Further, we conducted a system-to-system comparison between the PMRAccess-integrated LibCuckoo and Redis, and the results showed our proposal achieved up to \u0000<inline-formula> <tex-math>$13times $ </tex-math></inline-formula>\u0000 higher throughputs and \u0000<inline-formula> <tex-math>$40times $ </tex-math></inline-formula>\u0000 lower write latency than Redis.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3828-3839"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TPE-Det: A Tamper-Proof External Detector via Hardware Traces Analysis Against IoT Malware","authors":"Ziming Zhao;Zhaoxuan Li;Tingting Li;Fan Zhang","doi":"10.1109/TCAD.2024.3444712","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3444712","url":null,"abstract":"With the widespread use of Internet of Things (IoT) devices, malware detection has become a hot spot for both academic and industrial communities. A series of solutions based on system calls, system logs, or hardware performance counters achieve promising results. However, such internal monitors are easily tampered with, especially against adaptive adversaries. In addition, existing system log records typically exhibit substantial volume, resulting in data explosion problems. In this article, we present TPE-Det, a side-channel-based external monitor to cope with these issues. Specifically, TPE-Det leverages the serial peripheral interface bus to extract the on-chip traces and designs a recovery pipeline for operating logs. The advantages of this external monitor are adversary-unperceived and tamper-proof. The restored logs mainly include file operation commands, which are lightweight compared to complete records. Meanwhile, we deploy a series of machine learning models with respect to statistical, sequence, and graph features to identify malware. Empirical evaluation shows that our proposal has tamper-proof capability, high-detection accuracy, and low-time/space overhead compared to state-of-the-art methods.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3455-3466"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}