IEEE Transactions on Computers最新文献

筛选
英文 中文
Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving 利用结构化特征和运行时隔离实现高效推荐服务
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-08-28 DOI: 10.1109/TC.2024.3449749
Xin You;Hailong Yang;Siqi Wang;Tao Peng;Chen Ding;Xinyuan Li;Bangduo Chen;Zhongzhi Luan;Tongxuan Liu;Yong Li;Depei Qian
{"title":"Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving","authors":"Xin You;Hailong Yang;Siqi Wang;Tao Peng;Chen Ding;Xinyuan Li;Bangduo Chen;Zhongzhi Luan;Tongxuan Liu;Yong Li;Depei Qian","doi":"10.1109/TC.2024.3449749","DOIUrl":"10.1109/TC.2024.3449749","url":null,"abstract":"Recommendation serving with deep learning models is one of the most valuable services of modern E-commerce companies. In production, to accommodate billions of recommendation queries with stringent service level agreements, high-performant recommendation serving systems play an essential role in meeting such daunting demand. Unfortunately, existing model serving frameworks fail to achieve efficient serving due to unique challenges such as 1) the input format mismatch between service needs and the model's ability and 2) heavy software contentions to concurrently execute the constrained operations. To address the above challenges, we propose \u0000<i>RecServe</i>\u0000, a high-performant serving system for recommendation with the optimized design of \u0000<i>structured features</i>\u0000 and \u0000<i>SessionGroups</i>\u0000 for recommendation serving. With \u0000<i>structured features</i>\u0000, \u0000<i>RecServe</i>\u0000 packs single-user-multiple-candidates inputs by semi-automatically transforming computation graphs with annotated input tensors, which can significantly reduce redundant network transmission, data movements, and useless computations. With \u0000<i>session group</i>\u0000, \u0000<i>RecServe</i>\u0000 further adopts resource isolations for multiple compute streams and cost-aware operator scheduler with critical-path-based schedule policy to enable concurrent kernel execution, further improving serving throughput. The experiment results demonstrate that \u0000<i>RecServe</i>\u0000 can achieve maximum performance speedups of 12.3\u0000<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$22.0boldsymbol{times}$</tex-math></inline-formula>\u0000 compared to the state-of-the-art serving system on CPU and GPU platforms, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 11","pages":"2474-2487"},"PeriodicalIF":3.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Falic: An FPGA-Based Multi-Scalar Multiplication Accelerator for Zero-Knowledge Proof 法利克基于 FPGA 的零知识证明多乘法加速器
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-08-23 DOI: 10.1109/TC.2024.3449121
Yongkui Yang;Zhenyan Lu;Jingwei Zeng;Xingguo Liu;Xuehai Qian;Zhibin Yu
{"title":"Falic: An FPGA-Based Multi-Scalar Multiplication Accelerator for Zero-Knowledge Proof","authors":"Yongkui Yang;Zhenyan Lu;Jingwei Zeng;Xingguo Liu;Xuehai Qian;Zhibin Yu","doi":"10.1109/TC.2024.3449121","DOIUrl":"10.1109/TC.2024.3449121","url":null,"abstract":"In this paper, we propose Falic, a novel FPGA-based accelerator to accelerate multi-scalar multiplication (MSM), the most time-consuming phase of zk-SNARK proof generation. Falic innovates three techniques. First, it leverages globally asynchronous locally synchronous (GALS) strategy to build multiple small and lightweight MSM cores to parallelize the independent inner product computation on different portions of the scalar vector and point vector. Second, each MSM core contains just one large-integer modular multiplier (LIMM) that is multiplexed to perform the point additions (PADDs) generated during MSM. We strike a balance between the throughput and hardware cost by batching the appropriate number of PADDs and selecting the computation graph of PADD with proper parallelism degree. Finally, the performance is further improved by a simple cache structure that enables the computation reuse. We implement Falic on two different FPGAs with different hardware resources, i.e., the Xilinx U200 and Xilinx U250. Compared to the prior FPGA-based accelerator, Falic improves the MSM throughput by \u0000<inline-formula><tex-math>$3.9boldsymbol{times}$</tex-math></inline-formula>\u0000. Experimental results also show that Falic achieves a throughput speedup of up to \u0000<inline-formula><tex-math>$1.62boldsymbol{times}$</tex-math></inline-formula>\u0000 and saves as much as \u0000<inline-formula><tex-math>$8.5boldsymbol{times}$</tex-math></inline-formula>\u0000 energy compared to an RTX 2080Ti GPU.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 12","pages":"2791-2804"},"PeriodicalIF":3.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HGNAS: Hardware-Aware Graph Neural Architecture Search for Edge Devices HGNAS:面向边缘设备的硬件感知图神经架构搜索
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-08-23 DOI: 10.1109/TC.2024.3449108
Ao Zhou;Jianlei Yang;Yingjie Qi;Tong Qiao;Yumeng Shi;Cenlin Duan;Weisheng Zhao;Chunming Hu
{"title":"HGNAS: Hardware-Aware Graph Neural Architecture Search for Edge Devices","authors":"Ao Zhou;Jianlei Yang;Yingjie Qi;Tong Qiao;Yumeng Shi;Cenlin Duan;Weisheng Zhao;Chunming Hu","doi":"10.1109/TC.2024.3449108","DOIUrl":"10.1109/TC.2024.3449108","url":null,"abstract":"Graph Neural Networks (GNNs) are becoming increasingly popular for graph-based learning tasks such as point cloud processing due to their state-of-the-art (SOTA) performance. Nevertheless, the research community has primarily focused on improving model expressiveness, lacking consideration of how to design efficient GNN models for edge scenarios with real-time requirements and limited resources. Examining existing GNN models reveals varied execution across platforms and frequent Out-Of-Memory (OOM) problems, highlighting the need for hardware-aware GNN design. To address this challenge, this work proposes a novel hardware-aware graph neural architecture search framework tailored for resource constraint edge devices, namely HGNAS. To achieve hardware awareness, HGNAS integrates an efficient GNN hardware performance predictor that evaluates the latency and peak memory usage of GNNs in milliseconds. Meanwhile, we study GNN memory usage during inference and offer a peak memory estimation method, enhancing the robustness of architecture evaluations when combined with predictor outcomes. Furthermore, HGNAS constructs a fine-grained design space to enable the exploration of extreme performance architectures by decoupling the GNN paradigm. In addition, the multi-stage hierarchical search strategy is leveraged to facilitate the navigation of huge candidates, which can reduce the single search time to a few GPU hours. To the best of our knowledge, HGNAS is the first automated GNN design framework for edge devices, and also the first work to achieve hardware awareness of GNNs across different platforms. Extensive experiments across various applications and edge devices have proven the superiority of HGNAS. It can achieve up to a \u0000<inline-formula><tex-math>$10.6boldsymbol{times}$</tex-math></inline-formula>\u0000 speedup and an \u0000<inline-formula><tex-math>$82.5%$</tex-math></inline-formula>\u0000 peak memory reduction with negligible accuracy loss compared to DGCNN on ModelNet40.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 12","pages":"2693-2707"},"PeriodicalIF":3.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Efficient Deep Learning on MCU With Transient Redundancy Elimination 通过消除瞬态冗余在 MCU 上实现高效深度学习
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-08-23 DOI: 10.1109/TC.2024.3449102
Jiesong Liu;Feng Zhang;Jiawei Guan;Hsin-Hsuan Sung;Xiaoguang Guo;Saiqin Long;Xiaoyong Du;Xipeng Shen
{"title":"Enabling Efficient Deep Learning on MCU With Transient Redundancy Elimination","authors":"Jiesong Liu;Feng Zhang;Jiawei Guan;Hsin-Hsuan Sung;Xiaoguang Guo;Saiqin Long;Xiaoyong Du;Xipeng Shen","doi":"10.1109/TC.2024.3449102","DOIUrl":"10.1109/TC.2024.3449102","url":null,"abstract":"Deploying deep neural networks (DNNs) with satisfactory performance in resource-constrained environments is challenging. This is especially true of microcontrollers due to their tight space and computational capabilities. However, there is a growing demand for DNNs on microcontrollers, as executing large DNNs on microcontrollers is critical to reducing energy consumption, increasing performance efficiency, and eliminating privacy concerns. This paper presents a novel and systematic data redundancy elimination method to implement efficient DNNs on microcontrollers through innovations in computation and space optimization. By making the optimization itself a trainable component in the target neural networks, this method maximizes performance benefits while keeping the DNN accuracy stable. Experiments are performed on two microcontroller boards with three popular DNNs, namely CifarNet, ZfNet and SqueezeNet. Experiments show that this solution eliminates more than 96% of computations in DNNs and makes them fit well on microcontrollers, yielding 3.4-5\u0000<inline-formula><tex-math>$times$</tex-math></inline-formula>\u0000 speedup with little loss of accuracy.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 12","pages":"2649-2663"},"PeriodicalIF":3.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BiRD: Bi-Directional Input Reuse Dataflow for Enhancing Depthwise Convolution Performance on Systolic Arrays BiRD:用于提高收缩阵列深度卷积性能的双向输入重复使用数据流
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-08-23 DOI: 10.1109/TC.2024.3449103
Mingeon Park;Seokjin Hwang;Hyungmin Cho
{"title":"BiRD: Bi-Directional Input Reuse Dataflow for Enhancing Depthwise Convolution Performance on Systolic Arrays","authors":"Mingeon Park;Seokjin Hwang;Hyungmin Cho","doi":"10.1109/TC.2024.3449103","DOIUrl":"10.1109/TC.2024.3449103","url":null,"abstract":"Depthwise convolution (DWConv) is an effective technique for reducing the size and computational requirements of convolutional neural networks. However, DWConv's input reuse pattern is not easily transformed into dense matrix multiplications, leading to low utilization of processing elements (PEs) on existing systolic arrays. In this paper, we introduce a novel systolic array dataflow mechanism called \u0000<i>BiRD</i>\u0000, designed to maximize input reuse and boost DWConv performance. BiRD utilizes two directions of input reuse and necessitates only minor modifications to a typical weight-stationary type systolic array. We evaluate BiRD on the Gemmini platform, comparing it with existing dataflow types. The results demonstrate that BiRD achieves significant performance improvements in computation time reduction, while incurring minimal area overhead and improved energy consumption compared to other dataflow types. For example, on a 32\u0000<inline-formula><tex-math>$times{}$</tex-math></inline-formula>\u000032 systolic array, it results in a 9.8% area overhead, significantly smaller than other dataflow types for DWConv. Compared to matrix multiplication-based DWConv, BiRD achieves a 4.7\u0000<inline-formula><tex-math>$times{}$</tex-math></inline-formula>\u0000 performance improvement for DWConv layers of MobileNet-V2, resulting in a 55.8% reduction in total inference computation time and a 44.9% reduction in energy consumption. Our results highlight the effectiveness of BiRD in enhancing the performance of DWConv on systolic arrays.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 12","pages":"2708-2721"},"PeriodicalIF":3.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Joint Pruning and Channel-Wise Mixed-Precision Quantization for Efficient Deep Neural Networks 针对高效深度神经网络的联合剪枝和信道混合精度量化技术
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-08-23 DOI: 10.1109/TC.2024.3449084
Beatrice Alessandra Motetti;Matteo Risso;Alessio Burrello;Enrico Macii;Massimo Poncino;Daniele Jahier Pagliari
{"title":"Joint Pruning and Channel-Wise Mixed-Precision Quantization for Efficient Deep Neural Networks","authors":"Beatrice Alessandra Motetti;Matteo Risso;Alessio Burrello;Enrico Macii;Massimo Poncino;Daniele Jahier Pagliari","doi":"10.1109/TC.2024.3449084","DOIUrl":"10.1109/TC.2024.3449084","url":null,"abstract":"The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory occupation improvements. These optimization techniques are usually applied independently. We propose a novel methodology to apply them jointly via a lightweight gradient-based search, and in a hardware-aware manner, greatly reducing the time required to generate Pareto-optimal DNNs in terms of accuracy versus cost (i.e., latency or memory). We test our approach on three edge-relevant benchmarks, namely CIFAR-10, Google Speech Commands, and Tiny ImageNet. When targeting the optimization of the memory footprint, we are able to achieve a size reduction of 47.50% and 69.54% at iso-accuracy with the baseline networks with all weights quantized at 8 and 2-bit, respectively. Our method surpasses a previous state-of-the-art approach with up to 56.17% size reduction at iso-accuracy. With respect to the sequential application of state-of-the-art pruning and mixed-precision optimizations, we obtain comparable or superior results, but with a significantly lowered training time. In addition, we show how well-tailored cost models can improve the cost versus accuracy trade-offs when targeting specific hardware for deployment.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 11","pages":"2619-2633"},"PeriodicalIF":3.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimized Quantum Circuit of AES With Interlacing-Uncompute Structure 具有交错-非计算结构的 AES 优化量子电路
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-08-23 DOI: 10.1109/TC.2024.3449094
Mengyuan Zhang;Tairong Shi;Wenling Wu;Han Sui
{"title":"Optimized Quantum Circuit of AES With Interlacing-Uncompute Structure","authors":"Mengyuan Zhang;Tairong Shi;Wenling Wu;Han Sui","doi":"10.1109/TC.2024.3449094","DOIUrl":"10.1109/TC.2024.3449094","url":null,"abstract":"In the post-quantum era, the security level of encryption algorithms is often evaluated based on the quantum resources required to attack AES. In this work, we make thoroughly estimations on various performance metrics of the quantum circuit of AES-128/192/256. Firstly, we introduce a generic round structure for in-place implementation of the AES algorithm, maximizing the parallelism between nonlinear components. Specifically, when employed as an encryption oracle, our structure reduces the \u0000<inline-formula><tex-math>$T$</tex-math></inline-formula>\u0000-depth from \u0000<inline-formula><tex-math>$2rd$</tex-math></inline-formula>\u0000 to \u0000<inline-formula><tex-math>$(r+1)d$</tex-math></inline-formula>\u0000. Furthermore, by leveraging the properties of block-cyclic matrices, we present an in-place implementation circuit for MixColumn with depth 10, utilizing 105 CNOT gates. In relation to the S-box, we have assessed its minimum circuit width at different \u0000<inline-formula><tex-math>$T$</tex-math></inline-formula>\u0000-depths and provide multiple versions of circuit implementations for a depth-width trade-off. Finally, based on our optimized S-box circuit, we conduct a comprehensive analysis of the implementation complexity of different round structures, where our structure exhibits significant advantages in terms of low \u0000<inline-formula><tex-math>$T$</tex-math></inline-formula>\u0000-depth.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 11","pages":"2563-2575"},"PeriodicalIF":3.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SCARF: Securing Chips With a Robust Framework Against Fabrication-Time Hardware Trojans SCARF:利用稳健框架确保芯片安全,防范制造时硬件木马
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-08-23 DOI: 10.1109/TC.2024.3449082
Mohammad Eslami;Tara Ghasempouri;Samuel Pagliarini
{"title":"SCARF: Securing Chips With a Robust Framework Against Fabrication-Time Hardware Trojans","authors":"Mohammad Eslami;Tara Ghasempouri;Samuel Pagliarini","doi":"10.1109/TC.2024.3449082","DOIUrl":"10.1109/TC.2024.3449082","url":null,"abstract":"The globalization of the semiconductor industry has introduced security challenges to Integrated Circuits (ICs), particularly those related to the threat of Hardware Trojans (HTs) – malicious logic that can be introduced during IC fabrication. While significant efforts are directed towards verifying the correctness and reliability of ICs, their security is often overlooked. In this paper, we propose a comprehensive framework that integrates a suite of methodologies for both front-end and back-end stages of design, aimed at enhancing the security of ICs. Initially, we outline a systematic methodology to transform existing verification assets into potent security checkers by repurposing verification assertions. To further improve security, we introduce an innovative methodology for integrating online monitors during physical synthesis – a back-end insertion providing an additional layer of defense. Experimental results demonstrate a significant increase in security, measured by our introduced metric, Security Coverage (SC), with a marginal rise in area and power consumption, typically under 20%. The insertion of online monitors during physical synthesis enhances security metrics by up to 33.5%. This holistic framework offers a comprehensive defense mechanism across the entire spectrum of IC design.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 12","pages":"2761-2775"},"PeriodicalIF":3.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-Key Attack on Full-Round Shadow Designed for IoT Nodes 专为物联网节点设计的全圆影单键攻击
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-08-23 DOI: 10.1109/TC.2024.3449040
Yuhan Zhang;Wenling Wu;Lei Zhang;Yafei Zheng
{"title":"Single-Key Attack on Full-Round Shadow Designed for IoT Nodes","authors":"Yuhan Zhang;Wenling Wu;Lei Zhang;Yafei Zheng","doi":"10.1109/TC.2024.3449040","DOIUrl":"10.1109/TC.2024.3449040","url":null,"abstract":"With the rapid advancement of the Internet of Things (IoT), many innovative lightweight block ciphers have been introduced to meet the stringent security demands of IoT devices. Among these, the Shadow cipher stands out for its compactness, making it particularly well-suited for deployment in resource-constrained IoT nodes (IEEE Internet of Things Journal, 2021). This paper demonstrates two real-time attacks on Shadow for the first time: real-time plaintext recovery and key recovery. Firstly, numerous properties of Shadow are discussed, illustrating an equivalent representation of the two-round Shadow and the relationship between the round keys. Secondly, we introduce multiple two-round iterative linear approximations. Employing these approximations enables the derivation of full-round linear distinguishers. Moreover, we have uncovered numerous linear relationships between plaintext and ciphertext. Real-time plaintext recovery is achievable based on these established relationships. On average, it takes 5 seconds to recover the plaintext for a fixed ciphertext of Shadow-32. Thirdly, many properties of the propagation of difference through SIMON-like function are illustrated. According to these properties, various differential distinguishers up to full rounds are presented, allowing real-time key recovery. Specifically, the 64-bit master key of Shadow-32 can be retrieved in around two days on average. Experiments verify all our results.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 12","pages":"2776-2790"},"PeriodicalIF":3.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ROLoad-PMP: Securing Sensitive Operations for Kernels and Bare-Metal Firmware ROLoad-PMP:确保内核和裸机固件敏感操作的安全
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-08-23 DOI: 10.1109/TC.2024.3449105
Wende Tan;Chenyang Li;Yangyu Chen;Yuan Li;Chao Zhang;Jianping Wu
{"title":"ROLoad-PMP: Securing Sensitive Operations for Kernels and Bare-Metal Firmware","authors":"Wende Tan;Chenyang Li;Yangyu Chen;Yuan Li;Chao Zhang;Jianping Wu","doi":"10.1109/TC.2024.3449105","DOIUrl":"10.1109/TC.2024.3449105","url":null,"abstract":"A common way for attackers to compromise victim systems is hijacking sensitive operations (e.g., control-flow transfers) with attacker-controlled inputs. Existing solutions in general only protect parts of these targets and have high performance overheads, which are impractical and hard to deploy on systems with limited resources (e.g., IoT devices) or for low-level software like kernels and bare-metal firmware. In this paper, we present a lightweight hardware-software co-design solution ROLoad-PMP to protect sensitive operations from being hijacked for low-level software. First, we propose new instructions, which only load data from read-only memory regions with specific keys, to guarantee the integrity of pointees pointed by (potentially corrupted) data pointers. Then, we provide a program hardening mechanism to protect sensitive operations, by classifying and placing their operands into read-only memory with different keys at compile-time and loading them with ROLoad-PMP-family instructions at runtime. We have implemented an FPGA-based prototype of ROLoad-PMP based on RISC-V, and demonstrated an important defense application, i.e., forward-edge control-flow integrity. Results showed that ROLoad-PMP only costs few extra hardware resources (\u0000<inline-formula><tex-math>$lt 1.40%$</tex-math></inline-formula>\u0000). Moreover, it enables many lightweight (e.g., with negligible overheads \u0000<inline-formula><tex-math>$lt 0.853%$</tex-math></inline-formula>\u0000) defenses, and provides broader and stronger security guarantees than existing hardware solutions, e.g., ARM BTI and Intel CET.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 12","pages":"2722-2733"},"PeriodicalIF":3.6,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信