用端到端点云学习对原位手势进行分类

2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) Pub Date : 2021-08-22 DOI:10.1109/ismar52148.2021.00038

Lizhi Zhao, Xuequan Lu, Mingde Zhao, Meili Wang

{"title":"用端到端点云学习对原位手势进行分类","authors":"Lizhi Zhao, Xuequan Lu, Mingde Zhao, Meili Wang","doi":"10.1109/ismar52148.2021.00038","DOIUrl":null,"url":null,"abstract":"Walking in place for moving through virtual environments has attracted noticeable attention recently. Recent attempts focused on training a classifier to recognize certain patterns of gestures (e.g., standing, walking, etc) with the use of neural networks like CNN or LSTM. Nevertheless, they often consider very few types of gestures and/or induce less desired latency in virtual environments. In this paper, we propose a novel framework for accurate and efficient classification of in-place gestures. Our key idea is to treat several consecutive frames as a “point cloud”. The HMD and two VIVE trackers provide three points in each frame, with each point consisting of 12-dimensional features (i.e., three-dimensional position coordinates, velocity, rotation, angular velocity). We create a dataset consisting of 9 gesture classes for virtual in-place locomotion. In addition to the supervised point-based network, we also take unsupervised domain adaptation into account due to inter-person variations. To this end, we develop an end-to-end joint framework involving both a supervised loss for supervised point learning and an unsupervised loss for unsupervised domain adaptation. Experiments demonstrate that our approach generates very promising outcomes, in terms of high overall classification accuracy (95.0%) and real-time performance (192ms latency). We will release our dataset and source code to the community.","PeriodicalId":395413,"journal":{"name":"2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Classifying In-Place Gestures with End-to-End Point Cloud Learning\",\"authors\":\"Lizhi Zhao, Xuequan Lu, Mingde Zhao, Meili Wang\",\"doi\":\"10.1109/ismar52148.2021.00038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Walking in place for moving through virtual environments has attracted noticeable attention recently. Recent attempts focused on training a classifier to recognize certain patterns of gestures (e.g., standing, walking, etc) with the use of neural networks like CNN or LSTM. Nevertheless, they often consider very few types of gestures and/or induce less desired latency in virtual environments. In this paper, we propose a novel framework for accurate and efficient classification of in-place gestures. Our key idea is to treat several consecutive frames as a “point cloud”. The HMD and two VIVE trackers provide three points in each frame, with each point consisting of 12-dimensional features (i.e., three-dimensional position coordinates, velocity, rotation, angular velocity). We create a dataset consisting of 9 gesture classes for virtual in-place locomotion. In addition to the supervised point-based network, we also take unsupervised domain adaptation into account due to inter-person variations. To this end, we develop an end-to-end joint framework involving both a supervised loss for supervised point learning and an unsupervised loss for unsupervised domain adaptation. Experiments demonstrate that our approach generates very promising outcomes, in terms of high overall classification accuracy (95.0%) and real-time performance (192ms latency). We will release our dataset and source code to the community.\",\"PeriodicalId\":395413,\"journal\":{\"name\":\"2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ismar52148.2021.00038\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ismar52148.2021.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

在虚拟环境中移动的原地行走最近引起了人们的注意。最近的尝试集中在训练分类器识别手势的某些模式(例如，站立，行走等)，使用神经网络，如CNN或LSTM。然而，它们通常只考虑很少类型的手势和/或在虚拟环境中产生较少的期望延迟。在本文中，我们提出了一个新的框架来准确和有效地分类原位手势。我们的关键思想是把几个连续的帧当作一个“点云”。HMD和两个VIVE跟踪器在每帧中提供三个点，每个点由12维特征组成(即三维位置坐标，速度，旋转，角速度)。我们创建了一个由9个手势类组成的数据集，用于虚拟原地运动。除了基于监督点的网络，我们还考虑了由于人与人之间的变化而产生的无监督域自适应。为此，我们开发了一个端到端的联合框架，包括监督点学习的监督损失和无监督域自适应的无监督损失。实验表明，我们的方法产生了非常有希望的结果，在高总体分类准确率(95.0%)和实时性能(192ms延迟)方面。我们将向社区发布我们的数据集和源代码。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classifying In-Place Gestures with End-to-End Point Cloud Learning

Walking in place for moving through virtual environments has attracted noticeable attention recently. Recent attempts focused on training a classifier to recognize certain patterns of gestures (e.g., standing, walking, etc) with the use of neural networks like CNN or LSTM. Nevertheless, they often consider very few types of gestures and/or induce less desired latency in virtual environments. In this paper, we propose a novel framework for accurate and efficient classification of in-place gestures. Our key idea is to treat several consecutive frames as a “point cloud”. The HMD and two VIVE trackers provide three points in each frame, with each point consisting of 12-dimensional features (i.e., three-dimensional position coordinates, velocity, rotation, angular velocity). We create a dataset consisting of 9 gesture classes for virtual in-place locomotion. In addition to the supervised point-based network, we also take unsupervised domain adaptation into account due to inter-person variations. To this end, we develop an end-to-end joint framework involving both a supervised loss for supervised point learning and an unsupervised loss for unsupervised domain adaptation. Experiments demonstrate that our approach generates very promising outcomes, in terms of high overall classification accuracy (95.0%) and real-time performance (192ms latency). We will release our dataset and source code to the community.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR)

自引率

0.00%

发文量