{"title":"Refractive height fields from single and multiple images","authors":"Qi Shan, Sameer Agarwal, B. Curless","doi":"10.1109/CVPR.2012.6247687","DOIUrl":"https://doi.org/10.1109/CVPR.2012.6247687","url":null,"abstract":"We propose a novel framework for reconstructing homogenous, transparent, refractive height-fields from a single viewpoint. The height-field is imaged against a known planar background, or sequence of backgrounds. Unlike existing approaches that do a point-by-point reconstruction - which is known to have intractable ambiguities - our method estimates and optimizes for the entire height-field at the same time. The formulation supports shape recovery from measured distortions (deflections) or directly from the images themselves, including from a single image. We report results for a variety of refractive height-fields showing significant improvement over prior art.","PeriodicalId":177454,"journal":{"name":"2012 IEEE Conference on Computer Vision and Pattern Recognition","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129896096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust camera self-calibration from monocular images of Manhattan worlds","authors":"H. Wildenauer, A. Hanbury","doi":"10.1109/CVPR.2012.6248008","DOIUrl":"https://doi.org/10.1109/CVPR.2012.6248008","url":null,"abstract":"We focus on the detection of orthogonal vanishing points using line segments extracted from a single view, and using these for camera self-calibration. Recent methods view this problem as a two-stage process. Vanishing points are extracted through line segment clustering and subsequently likely orthogonal candidates are selected for calibration. Unfortunately, such an approach is easily distracted by the presence of clutter. Furthermore, geometric constraints imposed by the camera and scene orthogonality are not enforced during detection, leading to inaccurate results which are often inadmissible for calibration. To overcome these limitations, we present a RANSAC-based approach using a minimal solution for estimating three orthogonal vanishing points and focal length from a set of four lines, aligned with either two or three orthogonal directions. In addition, we propose to refine the estimates using an efficient and robust Maximum Likelihood Estimator. Extensive experiments on standard datasets show that our contributions result in significant improvements over the state-of-the-art.","PeriodicalId":177454,"journal":{"name":"2012 IEEE Conference on Computer Vision and Pattern Recognition","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128895081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sien W. Chew, S. Lucey, P. Lucey, S. Sridharan, Jeff F. Conn
{"title":"Improved facial expression recognition via uni-hyperplane classification","authors":"Sien W. Chew, S. Lucey, P. Lucey, S. Sridharan, Jeff F. Conn","doi":"10.1109/CVPR.2012.6247973","DOIUrl":"https://doi.org/10.1109/CVPR.2012.6247973","url":null,"abstract":"Large margin learning approaches, such as support vector machines (SVM), have been successfully applied to numerous classification tasks, especially for automatic facial expression recognition. The risk of such approaches however, is their sensitivity to large margin losses due to the influence from noisy training examples and outliers which is a common problem in the area of affective computing (i.e., manual coding at the frame level is tedious so coarse labels are normally assigned). In this paper, we leverage the relaxation of the parallel-hyperplanes constraint and propose the use of modified correlation filters (MCF). The MCF is similar in spirit to SVMs and correlation filters, but with the key difference of optimizing only a single hyperplane. We demonstrate the superiority of MCF over current techniques on a battery of experiments.","PeriodicalId":177454,"journal":{"name":"2012 IEEE Conference on Computer Vision and Pattern Recognition","volume":"60 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126950737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eduard Serradell, Przemyslaw Glowacki, J. Kybic, F. Moreno-Noguer, P. Fua
{"title":"Robust non-rigid registration of 2D and 3D graphs","authors":"Eduard Serradell, Przemyslaw Glowacki, J. Kybic, F. Moreno-Noguer, P. Fua","doi":"10.1109/CVPR.2012.6247776","DOIUrl":"https://doi.org/10.1109/CVPR.2012.6247776","url":null,"abstract":"We present a new approach to matching graphs embedded in ℝ2 or ℝ3. Unlike earlier methods, our approach does not rely on the similarity of local appearance features, does not require an initial alignment, can handle partial matches, and can cope with non-linear deformations and topological differences. To handle arbitrary non-linear deformations, we represent them as Gaussian Processes. In the absence of appearance information, we iteratively establish correspondences between graph nodes, update the structure accordingly, and use the current mapping estimate to find the most likely correspondences that will be used in the next iteration. This makes the computation tractable. We demonstrate the effectiveness of our approach first on synthetic cases and then on angiography data, retinal fundus images, and microscopy image stacks acquired at very different resolutions.","PeriodicalId":177454,"journal":{"name":"2012 IEEE Conference on Computer Vision and Pattern Recognition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121841957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modulation transfer function of patch-based stereo systems","authors":"Ronny Klowsky, Arjan Kuijper, M. Goesele","doi":"10.1109/CVPR.2012.6247825","DOIUrl":"https://doi.org/10.1109/CVPR.2012.6247825","url":null,"abstract":"A widely used technique to recover a 3D surface from photographs is patch-based (multi-view) stereo reconstruction. Current methods are able to reproduce fine surface details, they are however limited by the sampling density and the patch size used for reconstruction. We show that there is a systematic error in the reconstruction depending on the details in the unknown surface (frequencies) and the reconstruction resolution. For this purpose we present a theoretical analysis of patch-based depth reconstruction. We prove that our model of the reconstruction process yields a linear system, allowing us to apply the transfer (or system) function concept. We derive the modulation transfer function theoretically and validate it experimentally on synthetic examples using rendered images as well as on photographs of a 3D test target. Our analysis proves that there is a significant but predictable amplitude loss in reconstructions of fine scale details. In a first experiment on real-world data we show how this can be compensated for within the limits of noise and reconstruction accuracy by an inverse transfer function in frequency space.","PeriodicalId":177454,"journal":{"name":"2012 IEEE Conference on Computer Vision and Pattern Recognition","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121958994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Omkar M. Parkhi, A. Vedaldi, Andrew Zisserman, C. V. Jawahar
{"title":"Cats and dogs","authors":"Omkar M. Parkhi, A. Vedaldi, Andrew Zisserman, C. V. Jawahar","doi":"10.1109/CVPR.2012.6248092","DOIUrl":"https://doi.org/10.1109/CVPR.2012.6248092","url":null,"abstract":"We investigate the fine grained object categorization problem of determining the breed of animal from an image. To this end we introduce a new annotated dataset of pets covering 37 different breeds of cats and dogs. The visual problem is very challenging as these animals, particularly cats, are very deformable and there can be quite subtle differences between the breeds. We make a number of contributions: first, we introduce a model to classify a pet breed automatically from an image. The model combines shape, captured by a deformable part model detecting the pet face, and appearance, captured by a bag-of-words model that describes the pet fur. Fitting the model involves automatically segmenting the animal in the image. Second, we compare two classification approaches: a hierarchical one, in which a pet is first assigned to the cat or dog family and then to a breed, and a flat one, in which the breed is obtained directly. We also investigate a number of animal and image orientated spatial layouts. These models are very good: they beat all previously published results on the challenging ASIRRA test (cat vs dog discrimination). When applied to the task of discriminating the 37 different breeds of pets, the models obtain an average accuracy of about 59%, a very encouraging result considering the difficulty of the problem.","PeriodicalId":177454,"journal":{"name":"2012 IEEE Conference on Computer Vision and Pattern Recognition","volume":"41 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121002518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Seeing double without confusion: Structure-from-motion in highly ambiguous scenes","authors":"Nianjuan Jiang, P. Tan, L. Cheong","doi":"10.1109/CVPR.2012.6247834","DOIUrl":"https://doi.org/10.1109/CVPR.2012.6247834","url":null,"abstract":"3D reconstruction from an unordered set of images may fail due to incorrect epipolar geometries (EG) between image pairs arising from ambiguous feature correspondences. Previous methods often analyze the consistency between different EGs, and regard the largest subset of self-consistent EGs as correct. However, as demonstrated in [14], such a largest self-consistent set often corresponds to incorrect result, especially when there are duplicate structures in the scene. We propose a novel optimization criteria based on the idea of `missing correspondences'. The global minimum of our optimization objective function is associated with the correct solution. We then design an efficient algorithm for minimization, whose convergence to a local minimum is guaranteed. Experimental results show our method outperforms the state-of-the-art.","PeriodicalId":177454,"journal":{"name":"2012 IEEE Conference on Computer Vision and Pattern Recognition","volume":"225 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116385239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach","authors":"Lixin Duan, Dong Xu, Shih-Fu Chang","doi":"10.1109/CVPR.2012.6247819","DOIUrl":"https://doi.org/10.1109/CVPR.2012.6247819","url":null,"abstract":"Recent work has demonstrated the effectiveness of domain adaptation methods for computer vision applications. In this work, we propose a new multiple source domain adaptation method called Domain Selection Machine (DSM) for event recognition in consumer videos by leveraging a large number of loosely labeled web images from different sources (e.g., Flickr.com and Photosig.com), in which there are no labeled consumer videos. Specifically, we first train a set of SVM classifiers (referred to as source classifiers) by using the SIFT features of web images from different source domains. We propose a new parametric target decision function to effectively integrate the static SIFT features from web images/video keyframes and the spacetime (ST) features from consumer videos. In order to select the most relevant source domains, we further introduce a new data-dependent regularizer into the objective of Support Vector Regression (SVR) using the ϵ-insensitive loss, which enforces the target classifier shares similar decision values on the unlabeled consumer videos with the selected source classifiers. Moreover, we develop an alternating optimization algorithm to iteratively solve the target decision function and a domain selection vector which indicates the most relevant source domains. Extensive experiments on three real-world datasets demonstrate the effectiveness of our proposed method DSM over the state-of-the-art by a performance gain up to 46.41%.","PeriodicalId":177454,"journal":{"name":"2012 IEEE Conference on Computer Vision and Pattern Recognition","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126842081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A two-stage approach to blind spatially-varying motion deblurring","authors":"Hui Ji, Kang Wang","doi":"10.1109/CVPR.2012.6247660","DOIUrl":"https://doi.org/10.1109/CVPR.2012.6247660","url":null,"abstract":"Many blind motion deblur methods model the motion blur as a spatially invariant convolution process. However, motion blur caused by the camera movement in 3D space during shutter time often leads to spatially varying blurring effect over the image. In this paper, we proposed an efficient two-stage approach to remove spatially-varying motion blurring from a single photo. There are three main components in our approach: (i) a minimization method of estimating region-wise blur kernels by using both image information and correlations among neighboring kernels, (ii) an interpolation scheme of constructing pixel-wise blur matrix from region-wise blur kernels, and (iii) a non-blind deblurring method robust to kernel errors. The experiments showed that the proposed method outperformed the existing software based approaches on tested real images.","PeriodicalId":177454,"journal":{"name":"2012 IEEE Conference on Computer Vision and Pattern Recognition","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125297437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Oikonomidis, Nikolaos Kyriazis, Antonis A. Argyros
{"title":"Tracking the articulated motion of two strongly interacting hands","authors":"I. Oikonomidis, Nikolaos Kyriazis, Antonis A. Argyros","doi":"10.1109/CVPR.2012.6247885","DOIUrl":"https://doi.org/10.1109/CVPR.2012.6247885","url":null,"abstract":"We propose a method that relies on markerless visual observations to track the full articulation of two hands that interact with each-other in a complex, unconstrained manner. We formulate this as an optimization problem whose 54-dimensional parameter space represents all possible configurations of two hands, each represented as a kinematic structure with 26 Degrees of Freedom (DoFs). To solve this problem, we employ Particle Swarm Optimization (PSO), an evolutionary, stochastic optimization method with the objective of finding the two-hands configuration that best explains observations provided by an RGB-D sensor. To the best of our knowledge, the proposed method is the first to attempt and achieve the articulated motion tracking of two strongly interacting hands. Extensive quantitative and qualitative experiments with simulated and real world image sequences demonstrate that an accurate and efficient solution of this problem is indeed feasible.","PeriodicalId":177454,"journal":{"name":"2012 IEEE Conference on Computer Vision and Pattern Recognition","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124155806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}