{"title":"Head and hands 3D tracking in real time by the EM algorithm","authors":"O. Bernier, D. Collobert","doi":"10.1109/RATFG.2001.938913","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938913","url":null,"abstract":"This paper presents a method for real time hand and head tracking, in three dimensions, using two cameras. This tracking is intended as a first step for a gesture recognition system, using the trajectories of the hands, or as input to a real time clone animation system. The method used is based on simple preprocessing followed by the use of a statistical model linking the observations to the parameters: the position of the hands and the head. Preprocessing consist of background subtraction followed by skin color detection, using a simple color lookup table. The statistical model is composed of three ellipsoids, one for each hand and one for the head. A Gaussian probability density with the same center and size is associated with each ellipse. The parameters of the model are adapted to the pixel detected by the preprocessing stage. The EM algorithm is used to obtain the parameters corresponding to the maximum of the likelihood. The hands and head tracking is realized in near real time on a single workstation.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121436849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time 3D hand posture estimation based on 2D appearance retrieval using monocular camera","authors":"N. Shimada, Kousuke Kimura, Y. Shirai","doi":"10.1109/RATFG.2001.938906","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938906","url":null,"abstract":"This paper proposes a system for estimating arbitrary 3D human hand postures in real-time. It can accept not only pre-determined hand signs but also arbitrary postures and it works in a monocular camera environment. The estimation is based on a 2D image retrieval. More than 16,000 possible hand appearances are first generated from a given 3D shape model by rotating model joints and stored in an appearance database. Every appearance is tagged with its own joint angles which are used when the appearance was generated. By retrieving the appearance in the database well-matching to the input image contour, the joint angles of the input shape can be rapidly obtained. The search area is reduced by using an adjacency map in the database. To prevent tracking failures, a fixed number of the well-matching appearances are saved at every frame. After the multiple neighborhoods of the saved appearances are merged, the unified neighborhood is searched for the estimate efficiently by beam search. The posture estimates result from experimental examples are shown.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128711357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Signer-independent sign language recognition based on SOFM/HMM","authors":"Gaolin Fang, Wen Gao, Jiyong Ma","doi":"10.1109/RATFG.2001.938915","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938915","url":null,"abstract":"The aim of sign language recognition is to provide an efficient and accurate mechanism to transcribe sign language into text or speech. State-of-the-art sign language recognition should be able to solve the signer-independent problem for practical application. In this paper, a hybrid SOFM/HMM system, which combines self-organizing feature maps (SOFMs) with hidden Markov models (HMMs), is presented for signer-independent Chinese sign language recognition. We implement the SOFM/HMM sign recognition system. Meanwhile, results from the HMM-based system are provided as comparison. Experimental results show the SOFM/HMM system increases the recognition accuracy by 5% than the HMM-based one. Furthermore, a self-adjusting recognition algorithm is also proposed for improving the SOFM/HMM discrimination. When it is applied to the SOFM/HMM system it can improve the recognition accuracy by 1.9%. All experiments were performed in real-time with the dictionary size 208.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133701314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video-based online face recognition using identity surfaces","authors":"Yongmin Li, S. Gong, H. Liddell","doi":"10.1109/RATFG.2001.938908","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938908","url":null,"abstract":"A multi-view dynamic face model is designed to extract the shape-and-pose-free texture patterns effaces. The model provides a precise correspondence to the task of recognition since the 3D shape information is used to warp the multi-view faces onto the model mean shape in frontal-view. The identity surface of each subject is constructed in a discriminant feature space from a sparse set of face texture patterns, or more practically, from one or more learning sequences containing the face of the subject. Instead of matching templates or estimating multi-modal density functions, face recognition can be performed by computing the pattern distances to the identity surfaces or trajectory distances between the object and model trajectories. Experimental results depict that this approach provides an accurate recognition rate while using trajectory distances achieves a more robust performance since the trajectories encode the spatio-temporal information and contain accumulated evidence about the moving faces in a video input.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117331408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"View-subspace analysis of multi-view face patterns","authors":"S. Li, Xiao-guang Lv, Hong Zhang","doi":"10.1109/RATFG.2001.938921","DOIUrl":"https://doi.org/10.1109/RATFG.2001.938921","url":null,"abstract":"Multi-view face detection and recognition has been a challenging problem. The challenge is due to the fact that the distribution of multi-view faces in a feature space is more dispersed and more complicated than that of frontal faces. This paper presents an investigation into several view-subspace representations of multi-view faces: learning by using independent component analysis (ICA), independent subspace analysis (ISA) and topographic independent component analysis (TICA). It is shown that view-specific basis components can be learned from multi-view face examples in an unsupervised way by using ICA, ISA and TICA; whereas the components learned by using principal component analysis reveal little view-related information. The learned results provide sensible basis for constructing view-subspaces for multi-view faces. Comparative experiments demonstrate distinctive properties of ICA, ISA and TICA results, and the suitability of the results as representations of multi-view faces.","PeriodicalId":355094,"journal":{"name":"Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems","volume":"433 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122470849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}