基于骨骼的人体活动识别分类

Agung Suhendar, Tri Ayuningsih, S. Suyanto
{"title":"基于骨骼的人体活动识别分类","authors":"Agung Suhendar, Tri Ayuningsih, S. Suyanto","doi":"10.1109/CyberneticsCom55287.2022.9865354","DOIUrl":null,"url":null,"abstract":"Human activity recognition (HAR) is critical for determining human interactions and interpersonal relationships. Among the various classification techniques, two things become the main focus of HAR, namely the type of activity and its localization. Most of the tasks in HAR involve identifying a human scene from a series of frames in a video, where the subject being monitored is free to perform an activity. For some of the current HAR approaches, 3D sensors are used as input extractors for the skeleton/body pose of the subject being monitored. It is much more precise than using only 2D information obtained from conventional cameras. Of course, the use of 3D sensors is a significant limitation for implementing video-based surveillance systems. In this research, we use the Deep learning OpenPose 3D method as a substitute for 3D sensors that can estimate the 3D frame/pose of the subject's body identified from conventional camera 2D input sources. It is then combined with other machine learning methods for the activity classification process from the obtained 3D framework. Classifiers that can be used include Support Vector Machine (SVM), Neural Network (NN), Long short-term memory (LSTM), and Transformer. Thus, HAR can be applied flexibly in various scopes of supervision without the help of 3D sensors. The experiment results inform that Transformer is the best in accuracy while SVM is in speed.","PeriodicalId":178279,"journal":{"name":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Skeletal-based Classification for Human Activity Recognition\",\"authors\":\"Agung Suhendar, Tri Ayuningsih, S. Suyanto\",\"doi\":\"10.1109/CyberneticsCom55287.2022.9865354\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human activity recognition (HAR) is critical for determining human interactions and interpersonal relationships. Among the various classification techniques, two things become the main focus of HAR, namely the type of activity and its localization. Most of the tasks in HAR involve identifying a human scene from a series of frames in a video, where the subject being monitored is free to perform an activity. For some of the current HAR approaches, 3D sensors are used as input extractors for the skeleton/body pose of the subject being monitored. It is much more precise than using only 2D information obtained from conventional cameras. Of course, the use of 3D sensors is a significant limitation for implementing video-based surveillance systems. In this research, we use the Deep learning OpenPose 3D method as a substitute for 3D sensors that can estimate the 3D frame/pose of the subject's body identified from conventional camera 2D input sources. It is then combined with other machine learning methods for the activity classification process from the obtained 3D framework. Classifiers that can be used include Support Vector Machine (SVM), Neural Network (NN), Long short-term memory (LSTM), and Transformer. Thus, HAR can be applied flexibly in various scopes of supervision without the help of 3D sensors. The experiment results inform that Transformer is the best in accuracy while SVM is in speed.\",\"PeriodicalId\":178279,\"journal\":{\"name\":\"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)\",\"volume\":\"80 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CyberneticsCom55287.2022.9865354\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CyberneticsCom55287.2022.9865354","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

人类活动识别(HAR)是确定人类互动和人际关系的关键。在各种分类技术中,有两件事成为HAR的主要焦点,即活动类型及其定位。HAR中的大多数任务涉及从视频中的一系列帧中识别人类场景,其中被监控的主体可以自由地执行活动。对于目前的一些HAR方法,3D传感器被用作被监测对象的骨骼/身体姿势的输入提取器。它比仅使用从传统相机获得的二维信息精确得多。当然,3D传感器的使用是实现基于视频的监控系统的一个重大限制。在本研究中,我们使用深度学习OpenPose 3D方法作为3D传感器的替代品,可以估计从传统相机2D输入源识别的受试者身体的3D帧/姿势。然后将其与其他机器学习方法相结合,从获得的3D框架中进行活动分类过程。可以使用的分类器包括支持向量机(SVM)、神经网络(NN)、长短期记忆(LSTM)和Transformer。因此,无需借助3D传感器,HAR可以灵活地应用于各种监管范围。实验结果表明,变压器在精度上是最好的,而SVM在速度上是最好的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Skeletal-based Classification for Human Activity Recognition
Human activity recognition (HAR) is critical for determining human interactions and interpersonal relationships. Among the various classification techniques, two things become the main focus of HAR, namely the type of activity and its localization. Most of the tasks in HAR involve identifying a human scene from a series of frames in a video, where the subject being monitored is free to perform an activity. For some of the current HAR approaches, 3D sensors are used as input extractors for the skeleton/body pose of the subject being monitored. It is much more precise than using only 2D information obtained from conventional cameras. Of course, the use of 3D sensors is a significant limitation for implementing video-based surveillance systems. In this research, we use the Deep learning OpenPose 3D method as a substitute for 3D sensors that can estimate the 3D frame/pose of the subject's body identified from conventional camera 2D input sources. It is then combined with other machine learning methods for the activity classification process from the obtained 3D framework. Classifiers that can be used include Support Vector Machine (SVM), Neural Network (NN), Long short-term memory (LSTM), and Transformer. Thus, HAR can be applied flexibly in various scopes of supervision without the help of 3D sensors. The experiment results inform that Transformer is the best in accuracy while SVM is in speed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信