Online Human Action Recognition Using Deep Learning for Indoor Smart Mobile Robots

2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS) Pub Date : 2021-02-19 DOI:10.1109/ICCCIS51004.2021.9397242

Jih-Tang Hsieh, Meng-Lin Chiang, C. Fang, Sei-Wang Chen

{"title":"Online Human Action Recognition Using Deep Learning for Indoor Smart Mobile Robots","authors":"Jih-Tang Hsieh, Meng-Lin Chiang, C. Fang, Sei-Wang Chen","doi":"10.1109/ICCCIS51004.2021.9397242","DOIUrl":null,"url":null,"abstract":"This research proposes a vision-based online human action recognition system. This system uses deep learning methods to recognise human action under moving camera circumstances. The proposed system consists of five stages: human detection, human tracking, feature extraction, action classification and fusion. The system uses three kinds of input information: colour intensity, short-term dynamic information and skeletal joints. In the human detection stage, a two-dimensional (2D) pose estimator method is used to detect a human. In the human tracking stage, a deep SORT tracking method is used to track the human. In the feature extraction stage, three kinds of features, spatial, temporal and structural, are extracted to analyse human actions. In the action classification stage, three kinds of features of human actions are respectively classified by three kinds of long short-term memory (LSTM) classifiers. In the fusion stage, a fusion method is used to leverage the three output results from the LSTM classifiers. This study constructs a computer vision and image understanding (CVIU) Moving Camera Human Action dataset (CVIU dataset), containing 3,646 human action sequences, including 11 types of single human actions and 5 types of interactive human actions. This dataset was used to train and evaluate the proposed system. Experimental results showed that the recognition rates of spatial features, temporal features and structural features were 96.64%, 81.87% and 68.10%, respectively. Finally, the fusion result of human action recognition for indoor smart mobile robots in this study was 96.84%.","PeriodicalId":316752,"journal":{"name":"2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCIS51004.2021.9397242","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

This research proposes a vision-based online human action recognition system. This system uses deep learning methods to recognise human action under moving camera circumstances. The proposed system consists of five stages: human detection, human tracking, feature extraction, action classification and fusion. The system uses three kinds of input information: colour intensity, short-term dynamic information and skeletal joints. In the human detection stage, a two-dimensional (2D) pose estimator method is used to detect a human. In the human tracking stage, a deep SORT tracking method is used to track the human. In the feature extraction stage, three kinds of features, spatial, temporal and structural, are extracted to analyse human actions. In the action classification stage, three kinds of features of human actions are respectively classified by three kinds of long short-term memory (LSTM) classifiers. In the fusion stage, a fusion method is used to leverage the three output results from the LSTM classifiers. This study constructs a computer vision and image understanding (CVIU) Moving Camera Human Action dataset (CVIU dataset), containing 3,646 human action sequences, including 11 types of single human actions and 5 types of interactive human actions. This dataset was used to train and evaluate the proposed system. Experimental results showed that the recognition rates of spatial features, temporal features and structural features were 96.64%, 81.87% and 68.10%, respectively. Finally, the fusion result of human action recognition for indoor smart mobile robots in this study was 96.84%.

查看原文本刊更多论文

基于深度学习的室内智能移动机器人在线人体动作识别

本研究提出一种基于视觉的在线人体动作识别系统。该系统使用深度学习方法来识别移动摄像机环境下的人类行为。该系统包括人体检测、人体跟踪、特征提取、动作分类和融合五个阶段。该系统使用三种输入信息:颜色强度、短期动态信息和骨骼关节。在人体检测阶段，采用二维姿态估计方法对人体进行检测。在人体跟踪阶段，采用深度排序跟踪方法对人体进行跟踪。在特征提取阶段，提取空间、时间和结构三种特征来分析人的行为。在动作分类阶段，人类动作的三种特征分别由三种长短期记忆(LSTM)分类器分类。在融合阶段，使用融合方法来利用LSTM分类器的三个输出结果。本研究构建了一个计算机视觉与图像理解(CVIU)移动摄像机人类动作数据集(CVIU dataset)，该数据集包含3646个人类动作序列，其中包括11种单一人类动作和5种交互人类动作。该数据集用于训练和评估所提出的系统。实验结果表明，空间特征、时间特征和结构特征的识别率分别为96.64%、81.87%和68.10%。最后，本研究室内智能移动机器人人体动作识别的融合结果为96.84%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)

自引率

0.00%

发文量