利用深度学习识别基于 RGB 图像的手势的 TQU-HG 数据集和比较研究

Q2 Mathematics
Van-Dinh Do, Van-Hung Le, Huu-Son Do, Van-Nam Phan, Trung-Hieu Te
{"title":"利用深度学习识别基于 RGB 图像的手势的 TQU-HG 数据集和比较研究","authors":"Van-Dinh Do, Van-Hung Le, Huu-Son Do, Van-Nam Phan, Trung-Hieu Te","doi":"10.11591/ijeecs.v34.i3.pp1603-1617","DOIUrl":null,"url":null,"abstract":"Hand gesture recognition has great applications in human-computer interaction (HCI), human-robot interaction (HRI), and supporting the deaf and mute. To build a hand gesture recognition model using deep learning (DL) with high results then needs to be trained on many data and in many different conditions and contexts. In this paper, we publish the TQU-HG dataset of large RGB images with low resolution (640×480) pixels, low light conditions, and fast speed (16 fps). TQU-HG dataset includes 60,000 images collected from 20 people (10 male, 10 female) with 15 gestures of both left and right hands. A comparative study with two branches: i) based on Mediapipe TML and ii) Based on convolutional neural networks (CNNs) (you only look once (YOLO); YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLO-Nas, single shot multiBox detector (SSD) VGG16, residual network (ResNet)18, ResNext50, ResNet152, ResNext50, MobileNet V3 small, and MobileNet V3 large), the architecture and operation of CNNs models are also introduced in detail. We especially fine-tune the model and evaluate it on TQU-HG and HaGRID datasets. The quantitative results of the training and testing are presented (F1-score of YOLOv8, YOLO-Nas, MobileNet V3 small, ResNet50 is 98.99%, 98.98%, 99.27%, 99.36%, respectively on the TQU-HG dataset and is 99.21%, 99.37%, 99.36%, 86.4%, 98.3%, respectively on the HaGRID dataset). The computation time of YOLOv8 is 6.19 fps on the CPU and 18.28 fps on the GPU.","PeriodicalId":13480,"journal":{"name":"Indonesian Journal of Electrical Engineering and Computer Science","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TQU-HG dataset and comparative study for hand gesture recognition of RGB-based images using deep learning\",\"authors\":\"Van-Dinh Do, Van-Hung Le, Huu-Son Do, Van-Nam Phan, Trung-Hieu Te\",\"doi\":\"10.11591/ijeecs.v34.i3.pp1603-1617\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hand gesture recognition has great applications in human-computer interaction (HCI), human-robot interaction (HRI), and supporting the deaf and mute. To build a hand gesture recognition model using deep learning (DL) with high results then needs to be trained on many data and in many different conditions and contexts. In this paper, we publish the TQU-HG dataset of large RGB images with low resolution (640×480) pixels, low light conditions, and fast speed (16 fps). TQU-HG dataset includes 60,000 images collected from 20 people (10 male, 10 female) with 15 gestures of both left and right hands. A comparative study with two branches: i) based on Mediapipe TML and ii) Based on convolutional neural networks (CNNs) (you only look once (YOLO); YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLO-Nas, single shot multiBox detector (SSD) VGG16, residual network (ResNet)18, ResNext50, ResNet152, ResNext50, MobileNet V3 small, and MobileNet V3 large), the architecture and operation of CNNs models are also introduced in detail. We especially fine-tune the model and evaluate it on TQU-HG and HaGRID datasets. The quantitative results of the training and testing are presented (F1-score of YOLOv8, YOLO-Nas, MobileNet V3 small, ResNet50 is 98.99%, 98.98%, 99.27%, 99.36%, respectively on the TQU-HG dataset and is 99.21%, 99.37%, 99.36%, 86.4%, 98.3%, respectively on the HaGRID dataset). The computation time of YOLOv8 is 6.19 fps on the CPU and 18.28 fps on the GPU.\",\"PeriodicalId\":13480,\"journal\":{\"name\":\"Indonesian Journal of Electrical Engineering and Computer Science\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Indonesian Journal of Electrical Engineering and Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.11591/ijeecs.v34.i3.pp1603-1617\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indonesian Journal of Electrical Engineering and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11591/ijeecs.v34.i3.pp1603-1617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

摘要

手势识别在人机交互(HCI)、人机交互(HRI)以及支持聋哑人等方面有着广泛的应用。要利用深度学习(DL)建立一个效果显著的手势识别模型,就需要在许多不同的条件和环境下对许多数据进行训练。在本文中,我们发布了 TQU-HG 数据集,该数据集包含低分辨率(640×480)像素、低光照条件和高速(16 帧/秒)的大型 RGB 图像。TQU-HG 数据集包含从 20 人(10 男 10 女)中收集的 60,000 张图像,左右手各 15 种手势。比较研究有两个分支:i) 基于 Mediapipe TML;ii) 基于卷积神经网络(CNNs)(你只看一次(YOLO);YOLOv5、YOLOv6、YOLOv7、YOLOv8、YOLO-Nas、单枪多盒检测器(SSD)VGG16、残差网络(ResNet)18、ResNext50、ResNet152、ResNext50、MobileNet V3 小模型和 MobileNet V3 大模型),详细介绍了 CNNs 模型的架构和运行。我们特别对模型进行了微调,并在 TQU-HG 和 HaGRID 数据集上进行了评估。我们给出了训练和测试的定量结果(在 TQU-HG 数据集上,YOLOv8、YOLO-Nas、MobileNet V3 small、ResNet50 的 F1 分数分别为 98.99%、98.98%、99.27%、99.36%;在 HaGRID 数据集上,YOLOv8、YOLO-Nas、MobileNet V3 small、ResNet50 的 F1 分数分别为 99.21%、99.37%、99.36%、86.4%、98.3%)。YOLOv8 在 CPU 上的计算时间为 6.19 fps,在 GPU 上的计算时间为 18.28 fps。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
TQU-HG dataset and comparative study for hand gesture recognition of RGB-based images using deep learning
Hand gesture recognition has great applications in human-computer interaction (HCI), human-robot interaction (HRI), and supporting the deaf and mute. To build a hand gesture recognition model using deep learning (DL) with high results then needs to be trained on many data and in many different conditions and contexts. In this paper, we publish the TQU-HG dataset of large RGB images with low resolution (640×480) pixels, low light conditions, and fast speed (16 fps). TQU-HG dataset includes 60,000 images collected from 20 people (10 male, 10 female) with 15 gestures of both left and right hands. A comparative study with two branches: i) based on Mediapipe TML and ii) Based on convolutional neural networks (CNNs) (you only look once (YOLO); YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLO-Nas, single shot multiBox detector (SSD) VGG16, residual network (ResNet)18, ResNext50, ResNet152, ResNext50, MobileNet V3 small, and MobileNet V3 large), the architecture and operation of CNNs models are also introduced in detail. We especially fine-tune the model and evaluate it on TQU-HG and HaGRID datasets. The quantitative results of the training and testing are presented (F1-score of YOLOv8, YOLO-Nas, MobileNet V3 small, ResNet50 is 98.99%, 98.98%, 99.27%, 99.36%, respectively on the TQU-HG dataset and is 99.21%, 99.37%, 99.36%, 86.4%, 98.3%, respectively on the HaGRID dataset). The computation time of YOLOv8 is 6.19 fps on the CPU and 18.28 fps on the GPU.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.90
自引率
0.00%
发文量
782
期刊介绍: The aim of Indonesian Journal of Electrical Engineering and Computer Science (formerly TELKOMNIKA Indonesian Journal of Electrical Engineering) is to publish high-quality articles dedicated to all aspects of the latest outstanding developments in the field of electrical engineering. Its scope encompasses the applications of Telecommunication and Information Technology, Applied Computing and Computer, Instrumentation and Control, Electrical (Power), Electronics Engineering and Informatics which covers, but not limited to, the following scope: Signal Processing[...] Electronics[...] Electrical[...] Telecommunication[...] Instrumentation & Control[...] Computing and Informatics[...]
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信