{"title":"SPViT: Accelerate Vision Transformer Inference on Mobile Devices via Adaptive Splitting and Offloading","authors":"Sifan Zhao;Tongtong Liu;Hai Jin;Dezhong Yao","doi":"10.1109/TMC.2025.3562721","DOIUrl":null,"url":null,"abstract":"The <italic>Vision Transformer</i> (ViT), which benefits from utilizing self-attention mechanisms, has demonstrated superior accuracy compared to CNNs. However, due to the expensive computational costs, deploying and inferring ViTs on resource-constrained mobile devices has become a challenge. To resolve this challenge, we conducted an empirical analysis to identify performance bottlenecks in deploying ViTs on mobile devices and explored viable solutions. In this paper, we propose SPViT, an adaptive split and offloading method that accelerates ViT inference on mobile devices. SPViT executes collaborative inference of ViT across available edge devices. We introduce a fine-grained splitting technique for the vision transformer structure. Furthermore, we propose an algorithm based on the Auto Regression model to predict partition latency and adaptive offload partitions. Finally, we design offline and online optimization methods to minimize the computational and communication overhead on each device. Based on real-world prototype experiments, SPViT effectively reduces inference latency by 2.2x to 3.3x across four state-of-the-art models.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 10","pages":"9303-9318"},"PeriodicalIF":9.2000,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10971255","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10971255/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The Vision Transformer (ViT), which benefits from utilizing self-attention mechanisms, has demonstrated superior accuracy compared to CNNs. However, due to the expensive computational costs, deploying and inferring ViTs on resource-constrained mobile devices has become a challenge. To resolve this challenge, we conducted an empirical analysis to identify performance bottlenecks in deploying ViTs on mobile devices and explored viable solutions. In this paper, we propose SPViT, an adaptive split and offloading method that accelerates ViT inference on mobile devices. SPViT executes collaborative inference of ViT across available edge devices. We introduce a fine-grained splitting technique for the vision transformer structure. Furthermore, we propose an algorithm based on the Auto Regression model to predict partition latency and adaptive offload partitions. Finally, we design offline and online optimization methods to minimize the computational and communication overhead on each device. Based on real-world prototype experiments, SPViT effectively reduces inference latency by 2.2x to 3.3x across four state-of-the-art models.
期刊介绍:
IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.