{"title":"基于传感器的高效人体活动识别的二值化变压器","authors":"Fei Luo;Anna Li;Salabat Khan;Kaishun Wu;Lu Wang","doi":"10.1109/TMC.2025.3526166","DOIUrl":null,"url":null,"abstract":"Transformer architectures are popularized in both vision and natural language processing tasks, and they have achieved new performance benchmarks because of their long-term dependencies modeling, efficient parallel processing, and increased model capacity. While transformers offer powerful capabilities, their demanding computational requirements clash with the real-time and energy-efficient needs of edge-oriented human activity recognition. It is necessary to compress the transformer to reduce its memory consumption and accelerate the inference. In this paper, we investigated the binarization of a transformer-DeepViT for efficient human activity recognition. For feeding sensor signals into DeepViT, we first processed sensor signals to spectrograms by using wavelet transform. Then we applied three methods to binarize DeepViT and evaluated it on three public benchmark datasets for sensor-based human activity recognition. Compared to the full-precision DeepViT, the fully binarized one (Bi-DeepViT) reduced about 96.7% model size and 99% BOPs (Bit Operations) with only a little accuracy compromised. Furthermore, we explored the effects of binarizing various components and latent binarization of DeepViT to understand their impact on the model. We also validated the performance of Bi-DeepViTs on two wireless sensing datasets. The result shows that a certain partial binarization can improve the performance of DeepViT. Our work is the first to apply a binarized transformer in HAR.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 5","pages":"4419-4433"},"PeriodicalIF":7.7000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bi-DeepViT: Binarized Transformer for Efficient Sensor-Based Human Activity Recognition\",\"authors\":\"Fei Luo;Anna Li;Salabat Khan;Kaishun Wu;Lu Wang\",\"doi\":\"10.1109/TMC.2025.3526166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformer architectures are popularized in both vision and natural language processing tasks, and they have achieved new performance benchmarks because of their long-term dependencies modeling, efficient parallel processing, and increased model capacity. While transformers offer powerful capabilities, their demanding computational requirements clash with the real-time and energy-efficient needs of edge-oriented human activity recognition. It is necessary to compress the transformer to reduce its memory consumption and accelerate the inference. In this paper, we investigated the binarization of a transformer-DeepViT for efficient human activity recognition. For feeding sensor signals into DeepViT, we first processed sensor signals to spectrograms by using wavelet transform. Then we applied three methods to binarize DeepViT and evaluated it on three public benchmark datasets for sensor-based human activity recognition. Compared to the full-precision DeepViT, the fully binarized one (Bi-DeepViT) reduced about 96.7% model size and 99% BOPs (Bit Operations) with only a little accuracy compromised. Furthermore, we explored the effects of binarizing various components and latent binarization of DeepViT to understand their impact on the model. We also validated the performance of Bi-DeepViTs on two wireless sensing datasets. The result shows that a certain partial binarization can improve the performance of DeepViT. Our work is the first to apply a binarized transformer in HAR.\",\"PeriodicalId\":50389,\"journal\":{\"name\":\"IEEE Transactions on Mobile Computing\",\"volume\":\"24 5\",\"pages\":\"4419-4433\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10829799/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10829799/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Bi-DeepViT: Binarized Transformer for Efficient Sensor-Based Human Activity Recognition
Transformer architectures are popularized in both vision and natural language processing tasks, and they have achieved new performance benchmarks because of their long-term dependencies modeling, efficient parallel processing, and increased model capacity. While transformers offer powerful capabilities, their demanding computational requirements clash with the real-time and energy-efficient needs of edge-oriented human activity recognition. It is necessary to compress the transformer to reduce its memory consumption and accelerate the inference. In this paper, we investigated the binarization of a transformer-DeepViT for efficient human activity recognition. For feeding sensor signals into DeepViT, we first processed sensor signals to spectrograms by using wavelet transform. Then we applied three methods to binarize DeepViT and evaluated it on three public benchmark datasets for sensor-based human activity recognition. Compared to the full-precision DeepViT, the fully binarized one (Bi-DeepViT) reduced about 96.7% model size and 99% BOPs (Bit Operations) with only a little accuracy compromised. Furthermore, we explored the effects of binarizing various components and latent binarization of DeepViT to understand their impact on the model. We also validated the performance of Bi-DeepViTs on two wireless sensing datasets. The result shows that a certain partial binarization can improve the performance of DeepViT. Our work is the first to apply a binarized transformer in HAR.
期刊介绍:
IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.