基于单流空间卷积神经网络的手部运动识别

Aldi Sidik Permana, E. C. Djamal, Fikri Nugraha, Fatan Kasyidi
{"title":"基于单流空间卷积神经网络的手部运动识别","authors":"Aldi Sidik Permana, E. C. Djamal, Fikri Nugraha, Fatan Kasyidi","doi":"10.23919/EECSI50503.2020.9251896","DOIUrl":null,"url":null,"abstract":"Human-robot interaction can be through several ways, such as through device control, sounds, brain, and body, or hand gesture. There are two main issues: the ability to adapt to extreme settings and the number of frames processed concerning memory capabilities. Although it is necessary to be careful with the selection of the number of frames so as not to burden the memory, this paper proposed identifying hand gesture of video using Spatial Convolutional Neural Networks (CNN). The sequential image's spatial arrangement is extracted from the frames contained in the video so that each frame can be identified as part of one of the hand movements. The research used VGG16, as CNN architecture is concerned with the depth of learning where there are 13 layers of convolution and three layers of identification. Hand gestures can only be identified into four movements, namely ‘right’, ‘left’, ‘grab’, and ‘phone’. Hand gesture identification on the video using Spatial CNN with an initial accuracy of 87.97%, then the second training increased to 98.05%. Accuracy was obtained after training using 5600 training data and 1120 test data, and the improvement occurred after manual noise reduction was performed.","PeriodicalId":6743,"journal":{"name":"2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)","volume":"28 1","pages":"172-176"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hand Movement Identification Using Single-Stream Spatial Convolutional Neural Networks\",\"authors\":\"Aldi Sidik Permana, E. C. Djamal, Fikri Nugraha, Fatan Kasyidi\",\"doi\":\"10.23919/EECSI50503.2020.9251896\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human-robot interaction can be through several ways, such as through device control, sounds, brain, and body, or hand gesture. There are two main issues: the ability to adapt to extreme settings and the number of frames processed concerning memory capabilities. Although it is necessary to be careful with the selection of the number of frames so as not to burden the memory, this paper proposed identifying hand gesture of video using Spatial Convolutional Neural Networks (CNN). The sequential image's spatial arrangement is extracted from the frames contained in the video so that each frame can be identified as part of one of the hand movements. The research used VGG16, as CNN architecture is concerned with the depth of learning where there are 13 layers of convolution and three layers of identification. Hand gestures can only be identified into four movements, namely ‘right’, ‘left’, ‘grab’, and ‘phone’. Hand gesture identification on the video using Spatial CNN with an initial accuracy of 87.97%, then the second training increased to 98.05%. Accuracy was obtained after training using 5600 training data and 1120 test data, and the improvement occurred after manual noise reduction was performed.\",\"PeriodicalId\":6743,\"journal\":{\"name\":\"2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)\",\"volume\":\"28 1\",\"pages\":\"172-176\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/EECSI50503.2020.9251896\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 7th International Conference on Electrical Engineering, Computer Sciences and Informatics (EECSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/EECSI50503.2020.9251896","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人机交互可以通过几种方式进行,例如通过设备控制、声音、大脑、身体或手势。有两个主要问题:适应极端设置的能力和处理的帧数与内存能力有关。虽然需要注意帧数的选择,以免增加内存负担,但本文提出了使用空间卷积神经网络(CNN)识别视频手势。序列图像的空间排列是从视频中包含的帧中提取出来的,这样每一帧都可以被识别为一个手部运动的一部分。该研究使用了VGG16,因为CNN架构关注的是学习深度,其中有13层卷积和3层识别。手势只能识别为四种动作,即“右”、“左”、“抓”和“打电话”。使用Spatial CNN对视频进行手势识别,初始准确率为87.97%,第二次训练准确率提高到98.05%。使用5600个训练数据和1120个测试数据进行训练后,准确率得到了提高,并且在进行人工降噪后有所提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hand Movement Identification Using Single-Stream Spatial Convolutional Neural Networks
Human-robot interaction can be through several ways, such as through device control, sounds, brain, and body, or hand gesture. There are two main issues: the ability to adapt to extreme settings and the number of frames processed concerning memory capabilities. Although it is necessary to be careful with the selection of the number of frames so as not to burden the memory, this paper proposed identifying hand gesture of video using Spatial Convolutional Neural Networks (CNN). The sequential image's spatial arrangement is extracted from the frames contained in the video so that each frame can be identified as part of one of the hand movements. The research used VGG16, as CNN architecture is concerned with the depth of learning where there are 13 layers of convolution and three layers of identification. Hand gestures can only be identified into four movements, namely ‘right’, ‘left’, ‘grab’, and ‘phone’. Hand gesture identification on the video using Spatial CNN with an initial accuracy of 87.97%, then the second training increased to 98.05%. Accuracy was obtained after training using 5600 training data and 1120 test data, and the improvement occurred after manual noise reduction was performed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信