SmartZone: Runtime Support for Secure and Efficient On-Device Inference on ARM TrustZone

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers Pub Date : 2025-04-08 DOI:10.1109/TC.2025.3557971

Zhaolong Jian;Xu Liu;Qiankun Dong;Longkai Cheng;Xueshuo Xie;Tao Li

{"title":"SmartZone: Runtime Support for Secure and Efficient On-Device Inference on ARM TrustZone","authors":"Zhaolong Jian;Xu Liu;Qiankun Dong;Longkai Cheng;Xueshuo Xie;Tao Li","doi":"10.1109/TC.2025.3557971","DOIUrl":null,"url":null,"abstract":"On-device inference is a burgeoning paradigm that performs model inference locally on end devices, allowing private data to remain local. ARM TrustZone as a widely supported trusted execution environment has been applied to provide confidentiality protection for on-device inference. However, with the rise of large-scale models like large language models (LLMs), TrustZone-based on-device inference faces challenges in migration difficulties and inefficient execution. The rudimentary TEE OS on TrustZone lacks both the inference runtime needed for building models and the parallel support necessary to accelerate inference. Moreover, the limited secure memory resources on end devices further constrain the model size and degrade performance. In this paper, we propose SmartZone to provide runtime support for secure and efficient on-device inference on TrustZone. SmartZone consists three main components: (1) a trusted inference-oriented operator set, providing the underlying mechanisms adapted to the TrustZone's execution mode for trusted inference of DNN models and LLMs. (2) the proactive multi-threading parallel support, which increases the number of CPU cores in the secure state via cross-world thread collaboration to achieve parallelism, and (3) the on-demand secure memory management method, which statically allocates the appropriate secure memory size based on pre-execution resource analysis. We implement a prototype of SmartZone on the Raspberry Pi 3B+ board and evaluate it on four well-known DNN models and llama2 LLM. Extensive experimental results show that SmartZone provides end-to-end protection for on-device inference while maintaining excellent performance. Compared to the origin trusted inference, SmartZone accelerates the inference speed by up to <inline-formula><tex-math>$4.26\\boldsymbol{\\times}$</tex-math></inline-formula> and reduces energy consumption by <inline-formula><tex-math>$65.81\\%$</tex-math></inline-formula>.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 6","pages":"2144-2158"},"PeriodicalIF":3.8000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10949698/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

On-device inference is a burgeoning paradigm that performs model inference locally on end devices, allowing private data to remain local. ARM TrustZone as a widely supported trusted execution environment has been applied to provide confidentiality protection for on-device inference. However, with the rise of large-scale models like large language models (LLMs), TrustZone-based on-device inference faces challenges in migration difficulties and inefficient execution. The rudimentary TEE OS on TrustZone lacks both the inference runtime needed for building models and the parallel support necessary to accelerate inference. Moreover, the limited secure memory resources on end devices further constrain the model size and degrade performance. In this paper, we propose SmartZone to provide runtime support for secure and efficient on-device inference on TrustZone. SmartZone consists three main components: (1) a trusted inference-oriented operator set, providing the underlying mechanisms adapted to the TrustZone's execution mode for trusted inference of DNN models and LLMs. (2) the proactive multi-threading parallel support, which increases the number of CPU cores in the secure state via cross-world thread collaboration to achieve parallelism, and (3) the on-demand secure memory management method, which statically allocates the appropriate secure memory size based on pre-execution resource analysis. We implement a prototype of SmartZone on the Raspberry Pi 3B+ board and evaluate it on four well-known DNN models and llama2 LLM. Extensive experimental results show that SmartZone provides end-to-end protection for on-device inference while maintaining excellent performance. Compared to the origin trusted inference, SmartZone accelerates the inference speed by up to

$4.26\boldsymbol{\times}$

and reduces energy consumption by

$65.81\%$

查看原文本刊更多论文

SmartZone：运行时支持在ARM TrustZone上安全高效的设备上推断

设备上推理是一种新兴的范例，它在终端设备上本地执行模型推理，允许私有数据保持本地。ARM TrustZone作为一个被广泛支持的可信执行环境，已被应用于为设备上推断提供机密性保护。然而，随着大型语言模型（llm）等大型模型的兴起，基于trustzone的设备上推理面临迁移困难和执行效率低下的挑战。TrustZone上的基本TEE操作系统既缺乏构建模型所需的推理运行时，也缺乏加速推理所需的并行支持。此外，终端设备上有限的安全内存资源进一步限制了模型的大小，降低了性能。在本文中，我们提出了SmartZone来为TrustZone上安全高效的设备上推断提供运行时支持。SmartZone主要由三个部分组成：(1)面向可信推理的操作符集，为DNN模型和llm的可信推理提供适应TrustZone执行模式的底层机制。(2)主动多线程并行支持，通过跨世界线程协作增加处于安全状态的CPU核数，实现并行性；(3)按需安全内存管理方法，基于预执行资源分析，静态分配适当的安全内存大小。我们在树莓派3B+板上实现了SmartZone的原型，并在四种知名的DNN模型和llama2 LLM上对其进行了评估。大量的实验结果表明，SmartZone在保持优异性能的同时，为设备上的推理提供端到端保护。与原始可信推理相比，SmartZone的推理速度提高了4.26美元，能耗降低了65.81%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computers 工程技术-工程：电子与电气

CiteScore

6.60

自引率

5.40%

发文量

199

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.