{"title":"TSM-MobileNetV3: A Novel Lightweight Network Model for Video Action Recognition","authors":"Shuang Zhang, Qing Tong, Zixiang Kong, Han Lin","doi":"10.1109/AINIT59027.2023.10212611","DOIUrl":null,"url":null,"abstract":"The deployment of video action recognition models on mobile and embedded devices is challenging due to the limited computational resources and storage capacity. To address this issue, we propose a novel lightweight network architecture named TSM-MobileNetV3. Based on the Temporal Shift Module (TSM), we replace the backbone network with MobileNetV3, which is flexible and easy to implement. The proposed model is evaluated using the HMDB51 dataset, with detection accuracy, inference speed, and model size as the evaluation metrics. Experimental results demonstrate that TSM-MobileNetV3 achieves a detection accuracy of Top-1-0.70 and Top-5-0.89 with only a 0.02 decrease in accuracy, while achieving a 50.27% improvement in inference speed and a significant reduction in model size compared to other lightweight models. TSM-MobileNetV3 has been successfully deployed on NVIDIA-jetson devices, with reasonable agility and response speed. Our proposed model shows promising performance on mobile and embedded devices, with reduced training and deployment requirements, enabling deployment on edge devices. This study provides new insights and directions for designing and applying lightweight models. The proposed lightweight network model has broad prospects for application in various fields, such as smart homes, intelligent surveillance, and autonomous driving. Our team is currently investigating the deployment of this model on simulation platforms such as Unity for further testing.","PeriodicalId":276778,"journal":{"name":"2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 4th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINIT59027.2023.10212611","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The deployment of video action recognition models on mobile and embedded devices is challenging due to the limited computational resources and storage capacity. To address this issue, we propose a novel lightweight network architecture named TSM-MobileNetV3. Based on the Temporal Shift Module (TSM), we replace the backbone network with MobileNetV3, which is flexible and easy to implement. The proposed model is evaluated using the HMDB51 dataset, with detection accuracy, inference speed, and model size as the evaluation metrics. Experimental results demonstrate that TSM-MobileNetV3 achieves a detection accuracy of Top-1-0.70 and Top-5-0.89 with only a 0.02 decrease in accuracy, while achieving a 50.27% improvement in inference speed and a significant reduction in model size compared to other lightweight models. TSM-MobileNetV3 has been successfully deployed on NVIDIA-jetson devices, with reasonable agility and response speed. Our proposed model shows promising performance on mobile and embedded devices, with reduced training and deployment requirements, enabling deployment on edge devices. This study provides new insights and directions for designing and applying lightweight models. The proposed lightweight network model has broad prospects for application in various fields, such as smart homes, intelligent surveillance, and autonomous driving. Our team is currently investigating the deployment of this model on simulation platforms such as Unity for further testing.