{"title":"Joint Segmentation and Pose Tracking of Human in Natural Videos","authors":"Taegyu Lim, Seunghoon Hong, Bohyung Han, J. Han","doi":"10.1109/ICCV.2013.108","DOIUrl":"https://doi.org/10.1109/ICCV.2013.108","url":null,"abstract":"We propose an on-line algorithm to extract a human by foreground/background segmentation and estimate pose of the human from the videos captured by moving cameras. We claim that a virtuous cycle can be created by appropriate interactions between the two modules to solve individual problems. This joint estimation problem is divided into two sub problems, foreground/background segmentation and pose tracking, which alternate iteratively for optimization, segmentation step generates foreground mask for human pose tracking, and human pose tracking step provides fore-ground response map for segmentation. The final solution is obtained when the iterative procedure converges. We evaluate our algorithm quantitatively and qualitatively in real videos involving various challenges, and present its outstanding performance compared to the state-of-the-art techniques for segmentation and pose estimation.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"1 1","pages":"833-840"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83112690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Latent Data Association: Bayesian Model Selection for Multi-target Tracking","authors":"Aleksandr V. Segal, I. Reid","doi":"10.1109/ICCV.2013.361","DOIUrl":"https://doi.org/10.1109/ICCV.2013.361","url":null,"abstract":"We propose a novel parametrization of the data association problem for multi-target tracking. In our formulation, the number of targets is implicitly inferred together with the data association, effectively solving data association and model selection as a single inference problem. The novel formulation allows us to interpret data association and tracking as a single Switching Linear Dynamical System (SLDS). We compute an approximate posterior solution to this problem using a dynamic programming/message passing technique. This inference-based approach allows us to incorporate richer probabilistic models into the tracking system. In particular, we incorporate inference over inliers/outliers and track termination times into the system. We evaluate our approach on publicly available datasets and demonstrate results competitive with, and in some cases exceeding the state of the art.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"31 1","pages":"2904-2911"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83728544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-dense Visual Odometry for a Monocular Camera","authors":"Jakob J. Engel, Jürgen Sturm, D. Cremers","doi":"10.1109/ICCV.2013.183","DOIUrl":"https://doi.org/10.1109/ICCV.2013.183","url":null,"abstract":"We propose a fundamentally novel approach to real-time visual odometry for a monocular camera. It allows to benefit from the simplicity and accuracy of dense tracking - which does not depend on visual features - while running in real-time on a CPU. The key idea is to continuously estimate a semi-dense inverse depth map for the current frame, which in turn is used to track the motion of the camera using dense image alignment. More specifically, we estimate the depth of all pixels which have a non-negligible image gradient. Each estimate is represented as a Gaussian probability distribution over the inverse depth. We propagate this information over time, and update it with new measurements as new images arrive. In terms of tracking accuracy and computational speed, the proposed method compares favorably to both state-of-the-art dense and feature-based visual odometry and SLAM algorithms. As our method runs in real-time on a CPU, it is of large practical value for robotics and augmented reality applications.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"19 1","pages":"1449-1456"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84619574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elisabeta Marinoiu, Dragos Papava, C. Sminchisescu
{"title":"Pictorial Human Spaces: How Well Do Humans Perceive a 3D Articulated Pose?","authors":"Elisabeta Marinoiu, Dragos Papava, C. Sminchisescu","doi":"10.1109/ICCV.2013.163","DOIUrl":"https://doi.org/10.1109/ICCV.2013.163","url":null,"abstract":"Human motion analysis in images and video is a central computer vision problem. Yet, there are no studies that reveal how humans perceive other people in images and how accurate they are. In this paper we aim to unveil some of the processing-as well as the levels of accuracy-involved in the 3D perception of people from images by assessing the human performance. Our contributions are: (1) the construction of an experimental apparatus that relates perception and measurement, in particular the visual and kinematic performance with respect to 3D ground truth when the human subject is presented an image of a person in a given pose, (2) the creation of a dataset containing images, articulated 2D and 3D pose ground truth, as well as synchronized eye movement recordings of human subjects, shown a variety of human body configurations, both easy and difficult, as well as their 're-enacted' 3D poses, (3) quantitative analysis revealing the human performance in 3D pose re-enactment tasks, the degree of stability in the visual fixation patterns of human subjects, and the way it correlates with different poses. We also discuss the implications of our findings for the construction of visual human sensing systems.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"46 1","pages":"1289-1296"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84653928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jon Almazán, Albert Gordo, A. Fornés, Ernest Valveny
{"title":"Handwritten Word Spotting with Corrected Attributes","authors":"Jon Almazán, Albert Gordo, A. Fornés, Ernest Valveny","doi":"10.1109/ICCV.2013.130","DOIUrl":"https://doi.org/10.1109/ICCV.2013.130","url":null,"abstract":"We propose an approach to multi-writer word spotting, where the goal is to find a query word in a dataset comprised of document images. We propose an attributes-based approach that leads to a low-dimensional, fixed-length representation of the word images that is fast to compute and, especially, fast to compare. This approach naturally leads to an unified representation of word images and strings, which seamlessly allows one to indistinctly perform query-by-example, where the query is an image, and query-by-string, where the query is a string. We also propose a calibration scheme to correct the attributes scores based on Canonical Correlation Analysis that greatly improves the results on a challenging dataset. We test our approach on two public datasets showing state-of-the-art results.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"1 1","pages":"1017-1024"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88596867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting Reflection Change for Automatic Reflection Removal","authors":"Yu Li, M. S. Brown","doi":"10.1109/ICCV.2013.302","DOIUrl":"https://doi.org/10.1109/ICCV.2013.302","url":null,"abstract":"This paper introduces an automatic method for removing reflection interference when imaging a scene behind a glass surface. Our approach exploits the subtle changes in the reflection with respect to the background in a small set of images taken at slightly different view points. Key to this idea is the use of SIFT-flow to align the images such that a pixel-wise comparison can be made across the input set. Gradients with variation across the image set are assumed to belong to the reflected scenes while constant gradients are assumed to belong to the desired background scene. By correctly labelling gradients belonging to reflection or background, the background scene can be separated from the reflection interference. Unlike previous approaches that exploit motion, our approach does not make any assumptions regarding the background or reflected scenes' geometry, nor requires the reflection to be static. This makes our approach practical for use in casual imaging scenarios. Our approach is straight forward and produces good results compared with existing methods.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"10 1","pages":"2432-2439"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87179389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-Time Solution to the Absolute Pose Problem with Unknown Radial Distortion and Focal Length","authors":"Z. Kukelova, Martin Bujnak, T. Pajdla","doi":"10.1109/ICCV.2013.350","DOIUrl":"https://doi.org/10.1109/ICCV.2013.350","url":null,"abstract":"The problem of determining the absolute position and orientation of a camera from a set of 2D-to-3D point correspondences is one of the most important problems in computer vision with a broad range of applications. In this paper we present a new solution to the absolute pose problem for camera with unknown radial distortion and unknown focal length from five 2D-to-3D point correspondences. Our new solver is numerically more stable, more accurate, and significantly faster than the existing state-of-the-art minimal four point absolute pose solvers for this problem. Moreover, our solver results in less solutions and can handle larger radial distortions. The new solver is straightforward and uses only simple concepts from linear algebra. Therefore it is simpler than the state-of-the-art Groebner basis solvers. We compare our new solver with the existing state-of-the-art solvers and show its usefulness on synthetic and real datasets.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"48 1","pages":"2816-2823"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88148469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coherent Motion Segmentation in Moving Camera Videos Using Optical Flow Orientations","authors":"M. Narayana, A. Hanson, E. Learned-Miller","doi":"10.1109/ICCV.2013.199","DOIUrl":"https://doi.org/10.1109/ICCV.2013.199","url":null,"abstract":"In moving camera videos, motion segmentation is commonly performed using the image plane motion of pixels, or optical flow. However, objects that are at different depths from the camera can exhibit different optical flows even if they share the same real-world motion. This can cause a depth-dependent segmentation of the scene. Our goal is to develop a segmentation algorithm that clusters pixels that have similar real-world motion irrespective of their depth in the scene. Our solution uses optical flow orientations instead of the complete vectors and exploits the well-known property that under camera translation, optical flow orientations are independent of object depth. We introduce a probabilistic model that automatically estimates the number of observed independent motions and results in a labeling that is consistent with real-world motion in the scene. The result of our system is that static objects are correctly identified as one segment, even if they are at different depths. Color features and information from previous frames in the video sequence are used to correct occasional errors due to the orientation-based segmentation. We present results on more than thirty videos from different benchmarks. The system is particularly robust on complex background scenes containing objects at significantly different depths.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"49 1","pages":"1577-1584"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91310511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Group Sparsity and Geometry Constrained Dictionary Learning for Action Recognition from Depth Maps","authors":"Jiajia Luo, Wei Wang, H. Qi","doi":"10.1109/ICCV.2013.227","DOIUrl":"https://doi.org/10.1109/ICCV.2013.227","url":null,"abstract":"Human action recognition based on the depth information provided by commodity depth sensors is an important yet challenging task. The noisy depth maps, different lengths of action sequences, and free styles in performing actions, may cause large intra-class variations. In this paper, a new framework based on sparse coding and temporal pyramid matching (TPM) is proposed for depth-based human action recognition. Especially, a discriminative class-specific dictionary learning algorithm is proposed for sparse coding. By adding the group sparsity and geometry constraints, features can be well reconstructed by the sub-dictionary belonging to the same class, and the geometry relationships among features are also kept in the calculated coefficients. The proposed approach is evaluated on two benchmark datasets captured by depth cameras. Experimental results show that the proposed algorithm repeatedly achieves superior performance to the state of the art algorithms. Moreover, the proposed dictionary learning method also outperforms classic dictionary learning approaches.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"25 1","pages":"1809-1816"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90457625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Enhanced Structure-from-Motion Paradigm Based on the Absolute Dual Quadric and Images of Circular Points","authors":"L. Calvet, Pierre Gurdjos","doi":"10.1109/ICCV.2013.126","DOIUrl":"https://doi.org/10.1109/ICCV.2013.126","url":null,"abstract":"This work aims at introducing a new unified Structure from Motion (SfM) paradigm in which images of circular point-pairs can be combined with images of natural points. An imaged circular point-pair encodes the 2D Euclidean structure of a world plane and can easily be derived from the image of a planar shape, especially those including circles. A classical SfM method generally runs two steps: first a projective factorization of all matched image points (into projective cameras and points) and second a camera self calibration that updates the obtained world from projective to Euclidean. This work shows how to introduce images of circular points in these two SfM steps while its key contribution is to provide the theoretical foundations for combining \"classical\" linear self-calibration constraints with additional ones derived from such images. We show that the two proposed SfM steps clearly contribute to better results than the classical approach. We validate our contributions on synthetic and real images.","PeriodicalId":6351,"journal":{"name":"2013 IEEE International Conference on Computer Vision","volume":"24 1","pages":"985-992"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73441409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}