基于噪声输入的视觉不变人体动作识别系统

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI:10.1109/CRV55824.2022.00017

Joo Wang Kim, J. Hernandez, Richard Cobos, Ricardo Palacios, Andres G. Abad

{"title":"基于噪声输入的视觉不变人体动作识别系统","authors":"Joo Wang Kim, J. Hernandez, Richard Cobos, Ricardo Palacios, Andres G. Abad","doi":"10.1109/CRV55824.2022.00017","DOIUrl":null,"url":null,"abstract":"We propose a skeleton-based Human Action Recognition (HAR) system, robust to both noisy inputs and perspective variation. This system receives RGB videos as input and consists of three modules: (M1) 2D Key-Points Estimation module, (M2) Robustness module, and (M3) Action Classification module; of which M2 is our main contribution. This module uses pre-trained 3D pose estimator and pose refinement networks to handle noisy information including missing points, and uses rotations of the 3D poses to add robustness to camera view-point variation. To evaluate our approach, we carried out comparison experiments between models trained with M2 and without it. These experiments were conducted on the UESTC view-varying dataset, on the i3DPost multi-view human action dataset and on a Boxing Actions dataset, created by us. Our system achieved positive results, improving the accuracy by 24%, 3% and 11% on each dataset, respectively. On the UESTC dataset, our method achieves the new state of the art for the cross-view evaluation protocols.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A View Invariant Human Action Recognition System for Noisy Inputs\",\"authors\":\"Joo Wang Kim, J. Hernandez, Richard Cobos, Ricardo Palacios, Andres G. Abad\",\"doi\":\"10.1109/CRV55824.2022.00017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a skeleton-based Human Action Recognition (HAR) system, robust to both noisy inputs and perspective variation. This system receives RGB videos as input and consists of three modules: (M1) 2D Key-Points Estimation module, (M2) Robustness module, and (M3) Action Classification module; of which M2 is our main contribution. This module uses pre-trained 3D pose estimator and pose refinement networks to handle noisy information including missing points, and uses rotations of the 3D poses to add robustness to camera view-point variation. To evaluate our approach, we carried out comparison experiments between models trained with M2 and without it. These experiments were conducted on the UESTC view-varying dataset, on the i3DPost multi-view human action dataset and on a Boxing Actions dataset, created by us. Our system achieved positive results, improving the accuracy by 24%, 3% and 11% on each dataset, respectively. On the UESTC dataset, our method achieves the new state of the art for the cross-view evaluation protocols.\",\"PeriodicalId\":131142,\"journal\":{\"name\":\"2022 19th Conference on Robots and Vision (CRV)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 19th Conference on Robots and Vision (CRV)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CRV55824.2022.00017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th Conference on Robots and Vision (CRV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV55824.2022.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们提出了一种基于骨骼的人体动作识别(HAR)系统，该系统对噪声输入和视角变化都具有鲁棒性。该系统以RGB视频为输入，由三个模块组成:(M1)二维关键点估计模块，(M2)鲁棒性模块，(M3)动作分类模块;其中M2是我们的主要贡献。该模块使用预训练的3D姿态估计器和姿态细化网络来处理包括缺失点在内的噪声信息，并使用3D姿态的旋转来增加相机视点变化的鲁棒性。为了评估我们的方法，我们在使用M2和不使用M2训练的模型之间进行了比较实验。这些实验是在我们创建的UESTC视图变化数据集，i3DPost多视图人体动作数据集和拳击动作数据集上进行的。我们的系统取得了积极的结果，在每个数据集上分别提高了24%，3%和11%的准确率。在UESTC数据集上，我们的方法实现了跨视图评估协议的新状态。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A View Invariant Human Action Recognition System for Noisy Inputs

We propose a skeleton-based Human Action Recognition (HAR) system, robust to both noisy inputs and perspective variation. This system receives RGB videos as input and consists of three modules: (M1) 2D Key-Points Estimation module, (M2) Robustness module, and (M3) Action Classification module; of which M2 is our main contribution. This module uses pre-trained 3D pose estimator and pose refinement networks to handle noisy information including missing points, and uses rotations of the 3D poses to add robustness to camera view-point variation. To evaluate our approach, we carried out comparison experiments between models trained with M2 and without it. These experiments were conducted on the UESTC view-varying dataset, on the i3DPost multi-view human action dataset and on a Boxing Actions dataset, created by us. Our system achieved positive results, improving the accuracy by 24%, 3% and 11% on each dataset, respectively. On the UESTC dataset, our method achieves the new state of the art for the cross-view evaluation protocols.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 19th Conference on Robots and Vision (CRV)

自引率

0.00%

发文量