A lightweight gesture recognition network

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2024-12-20 DOI:10.1016/j.jvcir.2024.104362

Jinzhao Guo, Xuemei Lei, Bo Li

{"title":"A lightweight gesture recognition network","authors":"Jinzhao Guo, Xuemei Lei, Bo Li","doi":"10.1016/j.jvcir.2024.104362","DOIUrl":null,"url":null,"abstract":"<div><div>As one of the main human–computer interaction methods, gesture recognition has an urgent issue to be addressed, which huge paramaters and massive computation of the classification and recognition algorithm cause high cost in practical applications. To reduce cost and enhance the detection efficiency, a lightweight model of gesture recognition algorithms is proposed in this paper, based on the YOLOv5s framework. Firstly, we adopt ShuffleNetV2 as the backbone network to reduce the computational load and enhance the model’s detection speed. Additionally, lightweight modules such as GSConv and VoVGSCSP are introduced into the neck network to further compress the model size while maintaining accuracy. Furthermore, the BiFPN (Bi-directional Feature Pyramid Network) structure is incorporated to enhance the network’s detection accuracy at a lower computational cost. Lastly, we introduce the Coordinate Attention (CA) mechanism to enhance the network’s focus on key features. To investigate the rationale behind the introduction of the CA attention mechanism and the BiFPN network structure, we analyze the extracted features and validate the network’s attention on different parts of the feature maps through visualization. Experimental results demonstrate that the proposed algorithm achieves an average precision of 95.2% on the HD-HaGRID dataset. Compared to the original YOLOv5s model, the proposal model reduces the parameter count by 70.6% and the model size by 69.2%. Therefore, this model is suitable for real-time gesture recognition classification and detection, demonstrating significant potential for practical applications.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104362"},"PeriodicalIF":2.6000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320324003183","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

As one of the main human–computer interaction methods, gesture recognition has an urgent issue to be addressed, which huge paramaters and massive computation of the classification and recognition algorithm cause high cost in practical applications. To reduce cost and enhance the detection efficiency, a lightweight model of gesture recognition algorithms is proposed in this paper, based on the YOLOv5s framework. Firstly, we adopt ShuffleNetV2 as the backbone network to reduce the computational load and enhance the model’s detection speed. Additionally, lightweight modules such as GSConv and VoVGSCSP are introduced into the neck network to further compress the model size while maintaining accuracy. Furthermore, the BiFPN (Bi-directional Feature Pyramid Network) structure is incorporated to enhance the network’s detection accuracy at a lower computational cost. Lastly, we introduce the Coordinate Attention (CA) mechanism to enhance the network’s focus on key features. To investigate the rationale behind the introduction of the CA attention mechanism and the BiFPN network structure, we analyze the extracted features and validate the network’s attention on different parts of the feature maps through visualization. Experimental results demonstrate that the proposed algorithm achieves an average precision of 95.2% on the HD-HaGRID dataset. Compared to the original YOLOv5s model, the proposal model reduces the parameter count by 70.6% and the model size by 69.2%. Therefore, this model is suitable for real-time gesture recognition classification and detection, demonstrating significant potential for practical applications.

查看原文本刊更多论文

一个轻量级的手势识别网络

作为主要的人机交互方法之一，手势识别是一个亟待解决的问题，其分类识别算法参数庞大，计算量大，在实际应用中成本高。为了降低成本，提高检测效率，本文提出了一种基于YOLOv5s框架的轻量级手势识别算法模型。首先，我们采用ShuffleNetV2作为骨干网，减少计算量，提高模型的检测速度。此外，颈部网络中还引入了GSConv和VoVGSCSP等轻量级模块，以进一步压缩模型尺寸，同时保持精度。此外，引入双向特征金字塔网络（bibidirectional Feature Pyramid Network, BiFPN）结构，以更低的计算成本提高网络的检测精度。最后，我们引入了协调注意（CA）机制来增强网络对关键特征的关注。为了研究引入CA注意机制和BiFPN网络结构背后的原理，我们分析了提取的特征，并通过可视化验证了网络对特征图不同部分的注意。实验结果表明，该算法在hd -海格数据集上的平均精度达到95.2%。与原来的YOLOv5s模型相比，提议模型的参数数减少了70.6%，模型尺寸减少了69.2%。因此，该模型适用于实时手势识别分类和检测，具有较大的实际应用潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.