IEEE Transactions on Computers最新文献

筛选
英文 中文
SmartZone: Runtime Support for Secure and Efficient On-Device Inference on ARM TrustZone SmartZone:运行时支持在ARM TrustZone上安全高效的设备上推断
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2025-04-08 DOI: 10.1109/TC.2025.3557971
Zhaolong Jian;Xu Liu;Qiankun Dong;Longkai Cheng;Xueshuo Xie;Tao Li
{"title":"SmartZone: Runtime Support for Secure and Efficient On-Device Inference on ARM TrustZone","authors":"Zhaolong Jian;Xu Liu;Qiankun Dong;Longkai Cheng;Xueshuo Xie;Tao Li","doi":"10.1109/TC.2025.3557971","DOIUrl":"https://doi.org/10.1109/TC.2025.3557971","url":null,"abstract":"On-device inference is a burgeoning paradigm that performs model inference locally on end devices, allowing private data to remain local. ARM TrustZone as a widely supported trusted execution environment has been applied to provide confidentiality protection for on-device inference. However, with the rise of large-scale models like large language models (LLMs), TrustZone-based on-device inference faces challenges in migration difficulties and inefficient execution. The rudimentary TEE OS on TrustZone lacks both the inference runtime needed for building models and the parallel support necessary to accelerate inference. Moreover, the limited secure memory resources on end devices further constrain the model size and degrade performance. In this paper, we propose SmartZone to provide runtime support for secure and efficient on-device inference on TrustZone. SmartZone consists three main components: (1) a trusted inference-oriented operator set, providing the underlying mechanisms adapted to the TrustZone's execution mode for trusted inference of DNN models and LLMs. (2) the proactive multi-threading parallel support, which increases the number of CPU cores in the secure state via cross-world thread collaboration to achieve parallelism, and (3) the on-demand secure memory management method, which statically allocates the appropriate secure memory size based on pre-execution resource analysis. We implement a prototype of SmartZone on the Raspberry Pi 3B+ board and evaluate it on four well-known DNN models and llama2 LLM. Extensive experimental results show that SmartZone provides end-to-end protection for on-device inference while maintaining excellent performance. Compared to the origin trusted inference, SmartZone accelerates the inference speed by up to <inline-formula><tex-math>$4.26boldsymbol{times}$</tex-math></inline-formula> and reduces energy consumption by <inline-formula><tex-math>$65.81%$</tex-math></inline-formula>.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"2144-2158"},"PeriodicalIF":3.6,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Quantum Secure Vector Dominance and Its Applications in Computational Geometry 高效量子安全矢量优势及其在计算几何中的应用
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2025-04-04 DOI: 10.1109/TC.2025.3557968
Wenjie Liu;Bingmei Su;Feiyang Sun
{"title":"Efficient Quantum Secure Vector Dominance and Its Applications in Computational Geometry","authors":"Wenjie Liu;Bingmei Su;Feiyang Sun","doi":"10.1109/TC.2025.3557968","DOIUrl":"https://doi.org/10.1109/TC.2025.3557968","url":null,"abstract":"Secure vector dominance is a key cryptographic primitive in secure computational geometry (SCG), determining the dominance relationship of vectors between two participants without revealing their private information. However, the security of traditional SVD protocols is compromised by the formidable computational power of quantum computing, and their efficiency needs further improvement. To address these challenges, an efficient quantum secure vector dominance (QSVD) protocol is proposed. Specifically, we first introduce a quantum private permutation (QPP) subprotocol to shuffle the elements of each participant's private input vector. To further facilitate secure data comparison, we propose an enhanced quantum millionaire subprotocol with equality determination functionality, building upon Jia's original protocol. Based on the above two subprotocols, we propose a QSVD protocol with polynomial complexity, deriving vector dominance in a single interaction with a semi-honest third party. Performance analyses confirm that QSVD protocol is correct, resilient against malicious attacks, and retains polynomial computational complexity, ensuring both security and efficiency. To demonstrate the scalability of the QSVD protocol, we illustrate its applications in several geometric computation problems, such as point-line inclusion determination, line-line intersect determination, and point-in-polygon determination. Finally, we validate the feasibility of our protocol by conducting comprehensive simulations on IBM's Qiskit platform, demonstrating its practical applicability and effectiveness in real quantum computing environments.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"2129-2143"},"PeriodicalIF":3.6,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10949787","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RSQC: Recursive Sparse QUBO Construction for Quantum Annealing Machines 量子退火机器的递归稀疏QUBO构造
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2025-04-04 DOI: 10.1109/TC.2025.3557965
Jianwen Luo;Yuhao Shu;Yajun Ha
{"title":"RSQC: Recursive Sparse QUBO Construction for Quantum Annealing Machines","authors":"Jianwen Luo;Yuhao Shu;Yajun Ha","doi":"10.1109/TC.2025.3557965","DOIUrl":"https://doi.org/10.1109/TC.2025.3557965","url":null,"abstract":"Quantum annealing algorithms have shown commercial potential in solving some instances of combinatorial optimization problems. However, existing mapping for general optimization problems into a compatible format for quantum annealing yields dense topology and complicated weighting, which limits the size of solvable problems on practical quantum annealing machines. To address this issue, we propose a novel mapping framework with three new techniques. First, to address the issue from general constraints, we introduce a recursive methodology to map constraints into interconnected Boolean gates and small algebraic cliques, which yields sparse topology and hardware-friendly biases/interactions. Second, to better address frequently-used constraints, we introduce a specialized penalty set based on this methodology with detailed optimizations. Third, to address the issue from the objective, we reformulate the complicated objective into a single multi-bit variable and apply binary search to its range, which turns each search step into a constraint-only problem. Compared with the state-of-the-art, experimental results and analysis over an exhaustive scan for operand bit-widths from 1 to 64 show that: (1) the growth order of the number of physical qubits with regard to operand bit-widths is reduced from <inline-formula><tex-math>$O(w^{2})$</tex-math></inline-formula> to <inline-formula><tex-math>$O(w)$</tex-math></inline-formula>, while the number is reduced by a factor of <inline-formula><tex-math>$10^{-1}$</tex-math></inline-formula> in the best case; (2) the dynamic range of biases/interactions is reduced from <inline-formula><tex-math>$O(2^{2w})$</tex-math></inline-formula> to <inline-formula><tex-math>$ lt 32$</tex-math></inline-formula>; (3) the graph minor embedding run time is reduced by a factor of <inline-formula><tex-math>$10^{-2}$</tex-math></inline-formula> in the best case. For the same optimization problem, our framework reduces the requirement of the number of physical qubits and machine precision, and shortens the time from problem to machine.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"2114-2128"},"PeriodicalIF":3.6,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An FPGA-Based Open-Source Hardware-Software Framework for Side-Channel Security Research 基于fpga的边信道安全研究开源软硬件框架
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2025-03-17 DOI: 10.1109/TC.2025.3551936
Davide Zoni;Andrea Galimberti;Davide Galli
{"title":"An FPGA-Based Open-Source Hardware-Software Framework for Side-Channel Security Research","authors":"Davide Zoni;Andrea Galimberti;Davide Galli","doi":"10.1109/TC.2025.3551936","DOIUrl":"https://doi.org/10.1109/TC.2025.3551936","url":null,"abstract":"Attacks based on side-channel analysis (SCA) pose a severe security threat to modern computing platforms, further exacerbated on IoT devices by their pervasiveness and handling of private and critical data. Designing SCA-resistant computing platforms requires a significant additional effort in the early stages of the IoT devices’ life cycle, which is severely constrained by strict time-to-market deadlines and tight budgets. This manuscript introduces a hardware-software framework meant for SCA research on FPGA targets. It delivers an IoT-class system-on-chip (SoC) that includes a RISC-V CPU, provides observability and controllability through an ad-hoc debug infrastructure to facilitate SCA attacks and evaluate the platform's security, and streamlines the deployment of SCA countermeasures through dedicated hardware and software features such as a DFS actuator and FreeRTOS support. The open-source release of the framework includes the SoC, the scripts to configure the computing platform, compile a target application, and assess the SCA security, as well as a suite of state-of-the-art attacks and countermeasures. The goal is to foster its adoption and novel developments in the field, empowering designers and researchers to focus on studying SCA countermeasures and attacks while relying on a sound and stable hardware-software platform as the foundation for their research.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"2087-2100"},"PeriodicalIF":3.6,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning Operators Performance Tuning for Changeable Sized Input Data on Tensor Accelerate Hardware 张量加速硬件上可变大小输入数据的深度学习算子性能调优
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2025-03-17 DOI: 10.1109/TC.2025.3551937
Pengyu Mu;Yi Liu;Rui Wang;Guoxiang Liu;Hangcheng An;Qianhe Zhao;Hailong Yang;Chenhao Xie;Zhongzhi Luan;Chunye Gong;Depei Qian
{"title":"Deep Learning Operators Performance Tuning for Changeable Sized Input Data on Tensor Accelerate Hardware","authors":"Pengyu Mu;Yi Liu;Rui Wang;Guoxiang Liu;Hangcheng An;Qianhe Zhao;Hailong Yang;Chenhao Xie;Zhongzhi Luan;Chunye Gong;Depei Qian","doi":"10.1109/TC.2025.3551937","DOIUrl":"https://doi.org/10.1109/TC.2025.3551937","url":null,"abstract":"The operator library is the fundamental infrastructure of deep learning acceleration hardware. Automatically generating the library and tuning its performance is promising because the manual development by well-trained and skillful programmers is costly in terms of both time and money. Tensor hardware has the best computing efficiency for deep learning applications, but the operator library programs are hard to tune because the tensor hardware primitives have many limitations. Otherwise, the performance is difficult to be fully explored. The recent advancement in LLM exacerbates this problem because the size of input data is not fixed. Therefore, mapping the computing tasks of operators to tensor hardware units is a significant challenge when the shape of the input tensor is unknown before the runtime. We propose DSAT, a deep learning operator performance autotuning technique for changeable-sized input data on tensor hardware. To match the input tensor's undetermined shape, we choose a group of abstract computing units as the basic building blocks of operators for changeable-sized input tensor shapes. We design a group of programming tuning rules to construct a large exploration space of the variant implementation of the operator programs. Based on these rules, we construct an intermediate representation of computing and memory access to describe the computing process and use it to map the abstract computing units to tensor primitives. To speed up the tuning process, we narrow down the optimization space by predicting the actual hardware resource requirement and providing an optimized cost model for performance prediction. DSAT achieves performance comparable to the vendor's manually tuned operator libraries. Compared to state-of-the-art deep learning compilers, it improves the performance of inference by 13% on average and decreases the tuning time by an order of magnitude.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"2101-2113"},"PeriodicalIF":3.6,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Consistent Sensing Data Sharing Among IoT Edge Servers via Lightweight Consensus 通过轻量级共识实现物联网边缘服务器之间一致的感知数据共享
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2025-03-11 DOI: 10.1109/TC.2025.3549616
Xiulong Liu;Zhiyuan Zheng;Hao Xu;Zhelin Liang;Gaowei Shi;Chenyu Zhang;Keqiu Li
{"title":"Enabling Consistent Sensing Data Sharing Among IoT Edge Servers via Lightweight Consensus","authors":"Xiulong Liu;Zhiyuan Zheng;Hao Xu;Zhelin Liang;Gaowei Shi;Chenyu Zhang;Keqiu Li","doi":"10.1109/TC.2025.3549616","DOIUrl":"https://doi.org/10.1109/TC.2025.3549616","url":null,"abstract":"Blockchain offers distinct advantages in terms of data credibility and provenance certification, and its fusion with Internet of Things (IoT) technology holds great promise. Nevertheless, IoT environments are marked by extensive node networks and intricate communication patterns, especially the sensing environment. The conventional blockchain consensus mechanism, hampered by its heavy reliance on computing resources and communication bandwidth, faces difficulties in ensuring seamless data exchange among IoT edge servers. The issues encountered by state-of-the-art Byzantine Fault Tolerance (BFT) consensus include: (i) high communication complexity between nodes; and (ii) the detrimental impact of Byzantine behavior on system performance. To overcome the above problems, we propose the lightweight blockchain consensus called AntB, firstly introducing the concept of sampling into the consensus and significantly reducing the number of participating consensus nodes from <inline-formula><tex-math>$N$</tex-math></inline-formula> to <inline-formula><tex-math>$n$</tex-math></inline-formula>, which lowers the consensus complexity to <inline-formula><tex-math>$mathbf{2cdot O(n)+O(N)}$</tex-math></inline-formula>. We design a dynamic reputation mechanism so that Byzantine nodes cannot control the sampling set to affect the activity of the consensus in the long term. When implementing AntB, we address three significant technical challenges: (i) to determine the optimal sample size, we propose a sampling calculation method based on statistical confidence intervals, where the sample size is primarily determined by the chosen confidence level and margin of error; (ii) to prevent Byzantine behavior, we devise a weighted random sampling mechanism utilizing reputation coefficients based on edge servers’ behaviors; and (iii) to maintain consensus activity and consistency after sampling, we propose the consensus mechanism for partial sampling and global verification to avert potential issues. We implement AntB and conduct performance evaluations in a server with 32 cores and 64GB of memory. The evaluation results indicate that, the more nodes participating in the process of consensus, the better the performance of AntB will be. Especially, compared to HotStuff, AntB has a 24.94% higher success rate and Transactions Per Second (TPS) can improve by 102.10% when the number of nodes is 300.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"2045-2057"},"PeriodicalIF":3.6,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AR-Light: Enabling Fast and Lightweight Multi-User Augmented Reality via Semantic Segmentation and Collaborative View Synchronization AR-Light:通过语义分割和协同视图同步实现快速轻量级多用户增强现实
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2025-03-10 DOI: 10.1109/TC.2025.3549629
Yu Wen;Aamir Bader Shah;Ruizhi Cao;Chen Zhang;Jiefu Chen;Xuqing Wu;Chenhao Xie;Xin Fu
{"title":"AR-Light: Enabling Fast and Lightweight Multi-User Augmented Reality via Semantic Segmentation and Collaborative View Synchronization","authors":"Yu Wen;Aamir Bader Shah;Ruizhi Cao;Chen Zhang;Jiefu Chen;Xuqing Wu;Chenhao Xie;Xin Fu","doi":"10.1109/TC.2025.3549629","DOIUrl":"https://doi.org/10.1109/TC.2025.3549629","url":null,"abstract":"Multi-user Augmented Reality (MuAR) allows multiple users to interact with shared virtual objects, facilitated by exchanging environment information. Current MuAR systems rely on 3D point clouds for real-world analysis, view synchronization, object rendering, and movement tracking. However, the complexity of 3D point clouds leads to significant processing delays, with approximately 80% of overhead in commercial frameworks. This hampers usability and degrades user experience. Our analysis reveals that maintaining the facing side of the real-world scene in a stable environment provides sufficient information for virtual object placement and rendering. To address this, we introduce a lightweight quadtree structure, representing 2D scenes through semantic segmentation and geometry, as an alternative to 3D point clouds. Additionally, we propose a novel correction method to handle potential shifts in virtual object placement during view synchronization among users. Combining all designs, we implement a fast and lightweight MuAR framework named <italic>AR-Light</i> and test our framework on commercial AR devices. The evaluation results on real-world applications demonstrate that AR-Light can achieve high performance in various real-world scenes while maintaining a comparable virtual object placement accuracy.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"2073-2086"},"PeriodicalIF":3.6,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DESA: Dataflow Efficient Systolic Array for Acceleration of Transformers 用于变压器加速的数据流高效收缩阵列
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2025-03-10 DOI: 10.1109/TC.2025.3549621
Zhican Wang;Hongxiang Fan;Guanghui He
{"title":"DESA: Dataflow Efficient Systolic Array for Acceleration of Transformers","authors":"Zhican Wang;Hongxiang Fan;Guanghui He","doi":"10.1109/TC.2025.3549621","DOIUrl":"https://doi.org/10.1109/TC.2025.3549621","url":null,"abstract":"Transformers have become prevalent in various Artificial Intelligence (AI) applications, spanning natural language processing to computer vision. Owing to their suboptimal performance on general-purpose platforms, various domain-specific accelerators that explore and utilize the model sparsity have been developed. Instead, we conduct a quantitative analysis of Transformers. (Transformers can be categorized into three types: Encoder-Only, Decoder-Only, and Encoder-Decoder. This paper focuses on Encoder-Only Transformers.) to identify key inefficiencies and adopt dataflow optimization to address them. These inefficiencies arise from <i>1)</i> diverse matrix multiplication, <i>2)</i> multi-phase non-linear operations and their dependencies, and <i>3)</i> heavy memory requirements. We introduce a novel dataflow design to support decoupling with latency hiding, effectively reducing the dependencies and addressing the performance bottlenecks of nonlinear operations. To enable fully fused attention computation, we propose practical tiling and mapping strategies to sustain high throughput and notably decrease memory requirements from <inline-formula><tex-math>$O(N^{2}H)$</tex-math></inline-formula> to <inline-formula><tex-math>$O(N)$</tex-math></inline-formula>. A hybrid buffer-level reuse strategy is also introduced to enhance utilization and diminish off-chip access. Based on these optimizations, we propose a novel systolic array design, named DESA, with three innovations: <i>1)</i> A reconfigurable vector processing unit (VPU) and immediate processing units (IPUs) that can be seamlessly fused within the systolic array to support various normalization, post-processing, and transposition operations with efficient latency hiding. <i>2)</i> A hybrid stationary systolic array that improves the compute and memory efficiency for matrix multiplications with diverse operational intensity and characteristics. <i>3)</i> A novel tile fusion processing that efficiently addresses the low utilization issue in the conventional systolic array during the data setup and offloading. Across various benchmarks, extensive experiments demonstrate that DESA archives <inline-formula><tex-math>$5.0boldsymbol{timesthicksim}8.3boldsymbol{times}$</tex-math></inline-formula> energy saving over 3090 GPU and <inline-formula><tex-math>$25.6boldsymbol{timesthicksim}88.4boldsymbol{times}$</tex-math></inline-formula> than Intel 6226R CPU. Compared to the SOTA designs, DESA achieves <inline-formula><tex-math>$11.6boldsymbol{timesthicksim}15.0boldsymbol{times}$</tex-math></inline-formula> speedup and up to <inline-formula><tex-math>$2.3times$</tex-math></inline-formula> energy saving over the SOTA accelerators.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"2058-2072"},"PeriodicalIF":3.6,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Uncover Secrets Through the Cover: A Deep Learning-Based Side-Channel Attack Against Kyber Implementations With Anti-Tampering Covers 通过封面揭开秘密:基于深度学习的反篡改封面Kyber实现侧信道攻击
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2025-03-04 DOI: 10.1109/TC.2025.3547610
Peng Chen;Jinnuo Li;Wei Cheng;Chi Cheng
{"title":"Uncover Secrets Through the Cover: A Deep Learning-Based Side-Channel Attack Against Kyber Implementations With Anti-Tampering Covers","authors":"Peng Chen;Jinnuo Li;Wei Cheng;Chi Cheng","doi":"10.1109/TC.2025.3547610","DOIUrl":"https://doi.org/10.1109/TC.2025.3547610","url":null,"abstract":"The probe can directly contact the microcontroller in a typical EM side-channel attack (SCA) targeting cryptographic implementations. However, in a more practical setting such as security level 2 of FIPS 140-3 or ISO/IEC 19790 standards, the microcontroller is required to be safeguarded by an opaque anti-tampering cover. This raises an interesting problem: Can we still launch EM attacks against microcontrollers running cryptographic implementations even when equipped with the cover? This paper proposes an improved deep-learning-based profiled attack against NIST KEM standard Kyber. Our key observation is that the distance between the probe and the microcontroller results in attenuation of signal strength. Moreover, the cover restricts the proximity of the probe, thereby limiting the signal-to-noise ratio. We propose an Adaptive Slimmed Pyramid Network (ASPN) model to instantiate a distinguisher in a plaintext-checking oracle-based SCA, which is generic and easy to implement. The proposed ASPN approach significantly enhances the feature extraction process by employing a pyramid network structure, while simultaneously avoiding the inclusion of excessive parameters. Real-world experiments demonstrate that our proposed distinguishers achieve an accuracy above <inline-formula><tex-math>$99%$</tex-math></inline-formula> with an <inline-formula><tex-math>$18$</tex-math></inline-formula> mm cover and higher than <inline-formula><tex-math>$89%$</tex-math></inline-formula> accuracy even with a <inline-formula><tex-math>$24$</tex-math></inline-formula> mm cover.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"2159-2167"},"PeriodicalIF":3.6,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tangram: Enabling Efficient and Balanced Dynamic Storage Extension on Sharding Blockchain Systems Tangram:在Sharding区块链系统上实现高效均衡的动态存储扩展
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2025-03-04 DOI: 10.1109/TC.2025.3547622
Hao Xu;Jiaqi Zhang;Xiulong Liu;Zhimin Yu;Tingyu Fan;Baochao Chen;Keqiu Li
{"title":"Tangram: Enabling Efficient and Balanced Dynamic Storage Extension on Sharding Blockchain Systems","authors":"Hao Xu;Jiaqi Zhang;Xiulong Liu;Zhimin Yu;Tingyu Fan;Baochao Chen;Keqiu Li","doi":"10.1109/TC.2025.3547622","DOIUrl":"https://doi.org/10.1109/TC.2025.3547622","url":null,"abstract":"In recent years, sharding technology has been frequently applied in blockchain systems to increase scalability. However, when new shards are added, the system may result in significant overhead in terms of computing and networking since the data allocation approach is incompatible with dynamic changes in shards. Currently, S-Store, the state-of-the-art sharding solution built on the account model, has a high re-computing latency when growing shard numbers and an unbalanced sharded data distribution after growth. To address these issues, this paper presents Tangram, an efficient and balanced dynamic storage extension approach for sharding blockchain systems. Tangram reduces system extension overhead and latency while ensuring a balanced shard distribution. In implementing Tangram, we tackle three main technical challenges as follows. (1) Designing a novel state tree structure for the storage and maintenance of sharding state data. We introduce the Jump Merkle Tree (JMT) based on the Merkle Tree, which integrates node migration and orderliness. (2) Presenting a protocol to be compatible with dynamic shard scenarios. We devise a shard addition protocol to improve system extension availability and decrease shard extension delay. (3) Proposing an approach to guarantee system longevity after extension. We first devise algorithms for the state tree to eradicate invalid states after system expansion. Furthermore, we introduce a shard reduction protocol to enhance system storage extension support in complex scenarios, such as cleaning up inactive states to avoid bloating the state tree. We conduct extensive experiments to evaluate the performance of Tangram. Experiment results demonstrate that Tangram outperforms existing solutions, showing reduced latency and superior data balance. When compared to the state-of-the-art sharding storage solution, Tangram decreases the transaction execute time by up to 87.84%, the state data migration by more than approximately 74%, and achieves up to 7.63x improvement in the standard deviation of sharding data balance.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"2031-2044"},"PeriodicalIF":3.6,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143929674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信