基于传感器的高效人体活动识别的二值化变压器

IF 7.7 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Mobile Computing Pub Date : 2025-01-06 DOI:10.1109/TMC.2025.3526166

Fei Luo;Anna Li;Salabat Khan;Kaishun Wu;Lu Wang

{"title":"基于传感器的高效人体活动识别的二值化变压器","authors":"Fei Luo;Anna Li;Salabat Khan;Kaishun Wu;Lu Wang","doi":"10.1109/TMC.2025.3526166","DOIUrl":null,"url":null,"abstract":"Transformer architectures are popularized in both vision and natural language processing tasks, and they have achieved new performance benchmarks because of their long-term dependencies modeling, efficient parallel processing, and increased model capacity. While transformers offer powerful capabilities, their demanding computational requirements clash with the real-time and energy-efficient needs of edge-oriented human activity recognition. It is necessary to compress the transformer to reduce its memory consumption and accelerate the inference. In this paper, we investigated the binarization of a transformer-DeepViT for efficient human activity recognition. For feeding sensor signals into DeepViT, we first processed sensor signals to spectrograms by using wavelet transform. Then we applied three methods to binarize DeepViT and evaluated it on three public benchmark datasets for sensor-based human activity recognition. Compared to the full-precision DeepViT, the fully binarized one (Bi-DeepViT) reduced about 96.7% model size and 99% BOPs (Bit Operations) with only a little accuracy compromised. Furthermore, we explored the effects of binarizing various components and latent binarization of DeepViT to understand their impact on the model. We also validated the performance of Bi-DeepViTs on two wireless sensing datasets. The result shows that a certain partial binarization can improve the performance of DeepViT. Our work is the first to apply a binarized transformer in HAR.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 5","pages":"4419-4433"},"PeriodicalIF":7.7000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bi-DeepViT: Binarized Transformer for Efficient Sensor-Based Human Activity Recognition\",\"authors\":\"Fei Luo;Anna Li;Salabat Khan;Kaishun Wu;Lu Wang\",\"doi\":\"10.1109/TMC.2025.3526166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Transformer architectures are popularized in both vision and natural language processing tasks, and they have achieved new performance benchmarks because of their long-term dependencies modeling, efficient parallel processing, and increased model capacity. While transformers offer powerful capabilities, their demanding computational requirements clash with the real-time and energy-efficient needs of edge-oriented human activity recognition. It is necessary to compress the transformer to reduce its memory consumption and accelerate the inference. In this paper, we investigated the binarization of a transformer-DeepViT for efficient human activity recognition. For feeding sensor signals into DeepViT, we first processed sensor signals to spectrograms by using wavelet transform. Then we applied three methods to binarize DeepViT and evaluated it on three public benchmark datasets for sensor-based human activity recognition. Compared to the full-precision DeepViT, the fully binarized one (Bi-DeepViT) reduced about 96.7% model size and 99% BOPs (Bit Operations) with only a little accuracy compromised. Furthermore, we explored the effects of binarizing various components and latent binarization of DeepViT to understand their impact on the model. We also validated the performance of Bi-DeepViTs on two wireless sensing datasets. The result shows that a certain partial binarization can improve the performance of DeepViT. Our work is the first to apply a binarized transformer in HAR.\",\"PeriodicalId\":50389,\"journal\":{\"name\":\"IEEE Transactions on Mobile Computing\",\"volume\":\"24 5\",\"pages\":\"4419-4433\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Mobile Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10829799/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10829799/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

Transformer架构在视觉和自然语言处理任务中都很流行，并且由于它们的长期依赖关系建模、高效并行处理和增加的模型容量，它们已经达到了新的性能基准。虽然变压器提供了强大的功能，但其苛刻的计算要求与面向边缘的人类活动识别的实时性和节能需求相冲突。为了减少变压器的内存消耗和提高推理速度，有必要对变压器进行压缩。在本文中，我们研究了一种用于高效人体活动识别的变压器- deepvit的二值化。为了将传感器信号输入到DeepViT中，我们首先利用小波变换将传感器信号处理成频谱图。然后，我们应用三种方法对DeepViT进行二值化，并在三个基于传感器的人类活动识别的公共基准数据集上对其进行评估。与全精度DeepViT相比，完全二值化的模型（Bi-DeepViT）减少了96.7%的模型尺寸和99%的BOPs（位操作），而精度只受到了一点影响。此外，我们还探讨了DeepViT的各种成分二值化和潜在二值化的影响，以了解它们对模型的影响。我们还在两个无线传感数据集上验证了Bi-DeepViTs的性能。结果表明，一定程度的部分二值化可以提高DeepViT的性能。我们的工作是第一个在HAR中应用二值化变压器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bi-DeepViT: Binarized Transformer for Efficient Sensor-Based Human Activity Recognition

Transformer architectures are popularized in both vision and natural language processing tasks, and they have achieved new performance benchmarks because of their long-term dependencies modeling, efficient parallel processing, and increased model capacity. While transformers offer powerful capabilities, their demanding computational requirements clash with the real-time and energy-efficient needs of edge-oriented human activity recognition. It is necessary to compress the transformer to reduce its memory consumption and accelerate the inference. In this paper, we investigated the binarization of a transformer-DeepViT for efficient human activity recognition. For feeding sensor signals into DeepViT, we first processed sensor signals to spectrograms by using wavelet transform. Then we applied three methods to binarize DeepViT and evaluated it on three public benchmark datasets for sensor-based human activity recognition. Compared to the full-precision DeepViT, the fully binarized one (Bi-DeepViT) reduced about 96.7% model size and 99% BOPs (Bit Operations) with only a little accuracy compromised. Furthermore, we explored the effects of binarizing various components and latent binarization of DeepViT to understand their impact on the model. We also validated the performance of Bi-DeepViTs on two wireless sensing datasets. The result shows that a certain partial binarization can improve the performance of DeepViT. Our work is the first to apply a binarized transformer in HAR.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Mobile Computing 工程技术-电信学

CiteScore

12.90

自引率

2.50%

发文量

403

审稿时长

6.6 months

期刊介绍： IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.