Romain Faugeroux, Thales Vieira, D. M. Morera, T. Lewiner
{"title":"Simplified Training for Gesture Recognition","authors":"Romain Faugeroux, Thales Vieira, D. M. Morera, T. Lewiner","doi":"10.1109/SIBGRAPI.2014.46","DOIUrl":null,"url":null,"abstract":"Since gesture is a fundamental form of human communication, its recognition by a computer generates a strong interest for many applications such as natural user interface and gaming. The popularization of real-time depth sensors brings such applications to the public at large. However, familiar gestures are culture-specific, and their automatic recognition must therefore result from a machine learning process. In particular this requires either teaching the user how to communicate with the machine, such as for popular mobile devices or gaming consoles, or tailoring the application to a specific public. The latter option serves a large number of applications such as sport monitoring, virtual reality or surveillance -- although it requires a usually tedious training. This work intends to simplify the training required by gesture recognition methods. While the traditional procedure uses a set of key poses, which must be defined and trained, prior to a set of gestures that must also be defined and trained, we propose to automatically deduce the set of key poses from the gesture training. We represent a record of gestures as a curve in high dimension and robustly segment it according to the curvature of that curve. Then we use an asymmetric Hausdorff distance between gestures to define a discriminant key pose as the most distant pose between gestures. This further allows to dynamically group gestures by similarity. The training only requires the user to perform the gestures and eventually refine the gesture labeling. The generated set of key poses and gestures then fits in previous human action recognition algorithms. Furthermore, this semi-supervised learning allows re-using a previous training to extend the set of gestures the computer should be able to recognize. Experimental results show that the automatically generated discriminant key poses lead to similar recognition accuracy as previous work.","PeriodicalId":146229,"journal":{"name":"2014 27th SIBGRAPI Conference on Graphics, Patterns and Images","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 27th SIBGRAPI Conference on Graphics, Patterns and Images","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIBGRAPI.2014.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Since gesture is a fundamental form of human communication, its recognition by a computer generates a strong interest for many applications such as natural user interface and gaming. The popularization of real-time depth sensors brings such applications to the public at large. However, familiar gestures are culture-specific, and their automatic recognition must therefore result from a machine learning process. In particular this requires either teaching the user how to communicate with the machine, such as for popular mobile devices or gaming consoles, or tailoring the application to a specific public. The latter option serves a large number of applications such as sport monitoring, virtual reality or surveillance -- although it requires a usually tedious training. This work intends to simplify the training required by gesture recognition methods. While the traditional procedure uses a set of key poses, which must be defined and trained, prior to a set of gestures that must also be defined and trained, we propose to automatically deduce the set of key poses from the gesture training. We represent a record of gestures as a curve in high dimension and robustly segment it according to the curvature of that curve. Then we use an asymmetric Hausdorff distance between gestures to define a discriminant key pose as the most distant pose between gestures. This further allows to dynamically group gestures by similarity. The training only requires the user to perform the gestures and eventually refine the gesture labeling. The generated set of key poses and gestures then fits in previous human action recognition algorithms. Furthermore, this semi-supervised learning allows re-using a previous training to extend the set of gestures the computer should be able to recognize. Experimental results show that the automatically generated discriminant key poses lead to similar recognition accuracy as previous work.