{"title":"Robust tracking and mapping with a handheld RGB-D camera","authors":"Kyoung-Rok Lee, Truong Q. Nguyen","doi":"10.1109/WACV.2014.6835732","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835732","url":null,"abstract":"In this paper, we propose a robust method for camera tracking and surface mapping using a handheld RGB-D camera which is effective in challenging situations such as fast camera motion or geometrically featureless scenes. The main contributions are threefold. First, we introduce a robust orientation estimation based on quaternion method for initial sparse estimation. By using visual feature points detection and matching, no prior or small movement assumption is required to estimate a rigid transformation between frames. Second, a weighted ICP (Iterative Closest Point) method for better rate of convergence in optimization and accuracy in resulting trajectory is proposed. While the conventional ICP fails when there is no 3D features in the scene, our approach achieves robustness by emphasizing the influence of points that contain more geometric information of the scene. Finally, we show quantitative results on an RGB-D benchmark dataset. The experiments on an RGB-D trajectory benchmark dataset demonstrate that our method is able to track camera pose accurately.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"32 1","pages":"1120-1127"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72577984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised iterative manifold alignment via local feature histograms","authors":"Ke Fan, A. Mian, Wanquan Liu, Lin Li","doi":"10.1109/WACV.2014.6836051","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836051","url":null,"abstract":"We propose a new unsupervised algorithm for the automatic alignment of two manifolds of different datasets with possibly different dimensionalities. Alignment is performed automatically without any assumptions on the correspondences between the two manifolds. The proposed algorithm automatically establishes an initial set of sparse correspondences between the two datasets by matching their underlying manifold structures. Local feature histograms are extracted at each point of the manifolds and matched using a robust algorithm to find the initial correspondences. Based on these sparse correspondences, an embedding space is estimated where the distance between the two manifolds is minimized while maximally retaining the original structure of the manifolds. The problem is formulated as a generalized eigenvalue problem and solved efficiently. Dense correspondences are then established between the two manifolds and the process is iteratively implemented until the two manifolds are correctly aligned consequently revealing their joint structure. We demonstrate the effectiveness of our algorithm on aligning protein structures, facial images of different subjects under pose variations and RGB and Depth data from Kinect. Comparison with an state-of-the-art algorithm shows the superiority of the proposed manifold alignment algorithm in terms of accuracy and computational time.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"112 1","pages":"572-579"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74683469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fully automatic 3D facial expression recognition using local depth features","authors":"Mingliang Xue, A. Mian, Wanquan Liu, Ling Li","doi":"10.1109/WACV.2014.6835736","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835736","url":null,"abstract":"Facial expressions form a significant part of our nonverbal communications and understanding them is essential for effective human computer interaction. Due to the diversity of facial geometry and expressions, automatic expression recognition is a challenging task. This paper deals with the problem of person-independent facial expression recognition from a single 3D scan. We consider only the 3D shape because facial expressions are mostly encoded in facial geometry deformations rather than textures. Unlike the majority of existing works, our method is fully automatic including the detection of landmarks. We detect the four eye corners and nose tip in real time on the depth image and its gradients using Haar-like features and AdaBoost classifier. From these five points, another 25 heuristic points are defined to extract local depth features for representing facial expressions. The depth features are projected to a lower dimensional linear subspace where feature selection is performed by maximizing their relevance and minimizing their redundancy. The selected features are then used to train a multi-class SVM for the final classification. Experiments on the benchmark BU-3DFE database show that the proposed method outperforms existing automatic techniques, and is comparable even to the approaches using manual landmarks.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"36 1","pages":"1096-1103"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75177414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simultaneous recognition of facial expression and identity via sparse representation","authors":"M. Mohammadi, E. Fatemizadeh, M. Mahoor","doi":"10.1109/WACV.2014.6835986","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835986","url":null,"abstract":"Automatic recognition of facial expression and facial identity from visual data are two challenging problems that are tied together. In the past decade, researchers have mostly tried to solve these two problems separately to come up with face identification systems that are expression-independent and facial expressions recognition systems that are person-independent. This paper presents a new framework using sparse representation for simultaneous recognition of facial expression and identity. Our framework is based on the assumption that any facial appearance is a sparse combination of identities and expressions (i.e., one identity and one expression). Our experimental results using the CK+ and MMI face datasets show that the proposed approach outperforms methods that conduct face identification and face recognition individually.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"8 1","pages":"1066-1073"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82892709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pedro A. Rodriguez, Nathan G. Drenkow, D. DeMenthon, Zachary H. Koterba, Kathleen Kauffman, Duane C. Cornish, Bart Paulhamus, R. J. Vogelstein
{"title":"Selection of universal features for image classification","authors":"Pedro A. Rodriguez, Nathan G. Drenkow, D. DeMenthon, Zachary H. Koterba, Kathleen Kauffman, Duane C. Cornish, Bart Paulhamus, R. J. Vogelstein","doi":"10.1109/WACV.2014.6836078","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836078","url":null,"abstract":"Neuromimetic algorithms, such as the HMAX algorithm, have been very successful in image classification tasks. However, current implementations of these algorithms do not scale well to large datasets. Often, target-specific features or patches are “learned” ahead of time and then correlated with test images during feature extraction. In this paper, we develop a novel method for selecting a single set of universal features that enables classification across a broad range of image classes. Our method trains multiple Random Forest classifiers using a large dictionary of features and then combines them using a majority voting scheme. This enables the selection of the most discriminative patches based on feature importance measures. Experiments demonstrate the viability of this method using HMAX features as well as the tradeoff between the number of universal features, classification performance, and processing time.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"12 1","pages":"355-362"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85190292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeremiah R. Barr, Leonardo A. Cament, K. Bowyer, P. Flynn
{"title":"Active Clustering with Ensembles for Social structure extraction","authors":"Jeremiah R. Barr, Leonardo A. Cament, K. Bowyer, P. Flynn","doi":"10.1109/WACV.2014.6835999","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835999","url":null,"abstract":"We introduce a method for extracting the social network structure for the persons appearing in a set of video clips. Individuals are unknown, and are not matched against known enrollments. An identity cluster representing an individual is formed by grouping similar-appearing faces from different videos. Each identity cluster is represented by a node in the social network. Two nodes are linked if the faces from their clusters appeared together in one or more video frames. Our approach incorporates a novel active clustering technique to create more accurate identity clusters based on feedback from the user about ambiguously matched faces. The final output consists of one or more network structures that represent the social group(s), and a list of persons who potentially connect multiple social groups. Our results demonstrate the efficacy of the proposed clustering algorithm and network analysis techniques.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"9 1","pages":"969-976"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85994711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video alignment to a common reference","authors":"Rahul Dutta, B. Draper, J. Beveridge","doi":"10.1109/WACV.2014.6836020","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836020","url":null,"abstract":"Handheld videos include unintentional motion (jitter) and often intentional motion (pan and/or zoom). Human viewers prefer to see jitter removed, creating a smoothly moving camera. For video analysis, in contrast, aligning to a fixed stable background is sometimes preferable. This paper presents an algorithm that removes both forms of motion using a novel and efficient way of tracking background points while ignoring moving foreground points. The approach is related to image mosaicing, but the result is a video rather than an enlarged still image. It is also related to multiple object tracking approaches, but simpler since moving objects need not be explicitly tracked. The algorithm presented takes as input a video and returns one or several stabilized videos. Videos are broken into parts when the algorithm detects the background changing and it becomes necessary to fix upon a new background. Our approach assumes the person holding the camera is standing in one place and that objects in motion do not dominate the image. Our algorithm performs better than several previously published approaches when compared on 1,401 handheld videos from the recently released Point-and-Shoot Face Recognition Challenge (PASC). The source code for this algorithm is being made available.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"6 1","pages":"808-815"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78767555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Krishnan Ramnath, Simon Baker, Lucy Vanderwende, M. El-Saban, Sudipta N. Sinha, A. Kannan, N. Hassan, Michel Galley, Yi Yang, Deva Ramanan, Alessandro Bergamo, L. Torresani
{"title":"AutoCaption: Automatic caption generation for personal photos","authors":"Krishnan Ramnath, Simon Baker, Lucy Vanderwende, M. El-Saban, Sudipta N. Sinha, A. Kannan, N. Hassan, Michel Galley, Yi Yang, Deva Ramanan, Alessandro Bergamo, L. Torresani","doi":"10.1109/WACV.2014.6835988","DOIUrl":"https://doi.org/10.1109/WACV.2014.6835988","url":null,"abstract":"AutoCaption is a system that helps a smartphone user generate a caption for their photos. It operates by uploading the photo to a cloud service where a number of parallel modules are applied to recognize a variety of entities and relations. The outputs of the modules are combined to generate a large set of candidate captions, which are returned to the phone. The phone client includes a convenient user interface that allows users to select their favorite caption, reorder, add, or delete words to obtain the grammatical style they prefer. The user can also select from multiple candidates returned by the recognition modules.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"34 1","pages":"1050-1057"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79291300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fully implicit alternating direction method of multipliers for the minimization of convex problems with an application to motion segmentation","authors":"Karin Tichmann, O. Junge","doi":"10.1109/WACV.2014.6836018","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836018","url":null,"abstract":"Motivated by a variational formulation of the motion segmentation problem, we propose a fully implicit variant of the (linearized) alternating direction method of multipliers for the minimization of convex functionals over a convex set. The new scheme does not require a step size restriction for stability and thus approaches the minimum using considerably fewer iterates. In numerical experiments on standard image sequences, the scheme often significantly outperforms other state of the art methods.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"57 1","pages":"823-830"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84567755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Radu Dondera, Vlad I. Morariu, Yulu Wang, L. Davis
{"title":"Interactive video segmentation using occlusion boundaries and temporally coherent superpixels","authors":"Radu Dondera, Vlad I. Morariu, Yulu Wang, L. Davis","doi":"10.1109/WACV.2014.6836023","DOIUrl":"https://doi.org/10.1109/WACV.2014.6836023","url":null,"abstract":"We propose an interactive video segmentation system built on the basis of occlusion and long term spatio-temporal structure cues. User supervision is incorporated in a superpixel graph clustering framework that differs crucially from prior art in that it modifies the graph according to the output of an occlusion boundary detector. Working with long temporal intervals (up to 100 frames) enables our system to significantly reduce annotation effort with respect to state of the art systems. Even though the segmentation results are less than perfect, they are obtained efficiently and can be used in weakly supervised learning from video or for video content description. We do not rely on a discriminative object appearance model and allow extracting multiple foreground objects together, saving user time if more than one object is present. Additional experiments with unsupervised clustering based on occlusion boundaries demonstrate the importance of this cue for video segmentation and thus validate our system design.","PeriodicalId":73325,"journal":{"name":"IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision","volume":"180 1","pages":"784-791"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88468919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}