{"title":"On Projective Reconstruction in Arbitrary Dimensions","authors":"B. Nasihatkon, R. Hartley, J. Trumpf","doi":"10.1109/CVPR.2014.68","DOIUrl":"https://doi.org/10.1109/CVPR.2014.68","url":null,"abstract":"We study the theory of projective reconstruction for multiple projections from an arbitrary dimensional projective space into lower-dimensional spaces. This problem is important due to its applications in the analysis of dynamical scenes. The current theory, due to Hartley and Schaffalitzky, is based on the Grassmann tensor, generalizing the ideas of fundamental matrix, trifocal tensor and quadrifocal tensor used in the well-studied case of 3D to 2D projections. We present a theory whose point of departure is the projective equations rather than the Grassmann tensor. This is a better fit for the analysis of approaches such as bundle adjustment and projective factorization which seek to directly solve the projective equations. In a first step, we prove that there is a unique Grassmann tensor corresponding to each set of image points, a question that remained open in the work of Hartley and Schaffalitzky. Then, we prove that projective equivalence follows from the set of projective equations given certain conditions on the estimated camera-point setup or the estimated projective depths. Finally, we demonstrate how wrong solutions to the projective factorization problem can happen, and classify such degenerate solutions based on the zero patterns in the estimated depth matrix.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130338969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mathieu Aubry, Daniel Maturana, Alexei A. Efros, Bryan C. Russell, Josef Sivic
{"title":"Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models","authors":"Mathieu Aubry, Daniel Maturana, Alexei A. Efros, Bryan C. Russell, Josef Sivic","doi":"10.1109/CVPR.2014.487","DOIUrl":"https://doi.org/10.1109/CVPR.2014.487","url":null,"abstract":"This paper poses object category detection in images as a type of 2D-to-3D alignment problem, utilizing the large quantities of 3D CAD models that have been made publicly available online. Using the \"chair\" class as a running example, we propose an exemplar-based 3D category representation, which can explicitly model chairs of different styles as well as the large variation in viewpoint. We develop an approach to establish part-based correspondences between 3D CAD models and real photographs. This is achieved by (i) representing each 3D model using a set of view-dependent mid-level visual elements learned from synthesized views in a discriminative fashion, (ii) carefully calibrating the individual element detectors on a common dataset of negative images, and (iii) matching visual elements to the test image allowing for small mutual deformations but preserving the viewpoint and style constraints. We demonstrate the ability of our system to align 3D models with 2D objects in the challenging PASCAL VOC images, which depict a wide variety of chairs in complex scenes.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130548144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aseem Behl, Iiit Hyderabad, India C V Jawahar, India M Pawan Kumar
{"title":"Optimizing Average Precision Using Weakly Supervised Data","authors":"Aseem Behl, Iiit Hyderabad, India C V Jawahar, India M Pawan Kumar","doi":"10.1109/CVPR.2014.133","DOIUrl":"https://doi.org/10.1109/CVPR.2014.133","url":null,"abstract":"The performance of binary classification tasks, such as action classification and object detection, is often measured in terms of the average precision (AP). Yet it is common practice in computer vision to employ the support vector machine (SVM) classifier, which optimizes a surrogate 0-1 loss. The popularity of SVM can be attributed to its empirical performance. Specifically, in fully supervised settings, SVM tends to provide similar accuracy to the AP-SVM classifier, which directly optimizes an AP-based loss. However, we hypothesize that in the significantly more challenging and practically useful setting of weakly supervised learning, it becomes crucial to optimize the right accuracy measure. In order to test this hypothesis, we propose a novel latent AP-SVM that minimizes a carefully designed upper bound on the AP-based loss function over weakly supervised samples. Using publicly available datasets, we demonstrate the advantage of our approach over standard loss-based binary classifiers on two challenging problems: action classification and character recognition.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"22 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127007701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Convolutional Neural Networks for No-Reference Image Quality Assessment","authors":"Le Kang, Peng Ye, Yi Li, D. Doermann","doi":"10.1109/CVPR.2014.224","DOIUrl":"https://doi.org/10.1109/CVPR.2014.224","url":null,"abstract":"In this work we describe a Convolutional Neural Network (CNN) to accurately predict image quality without a reference image. Taking image patches as input, the CNN works in the spatial domain without using hand-crafted features that are employed by most previous methods. The network consists of one convolutional layer with max and min pooling, two fully connected layers and an output node. Within the network structure, feature learning and regression are integrated into one optimization process, which leads to a more effective model for estimating image quality. This approach achieves state of the art performance on the LIVE dataset and shows excellent generalization ability in cross dataset experiments. Further experiments on images with local distortions demonstrate the local quality estimation ability of our CNN, which is rarely reported in previous literature.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129186155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abner Guzmán-Rivera, Pushmeet Kohli, B. Glocker, J. Shotton, T. Sharp, A. Fitzgibbon, S. Izadi
{"title":"Multi-output Learning for Camera Relocalization","authors":"Abner Guzmán-Rivera, Pushmeet Kohli, B. Glocker, J. Shotton, T. Sharp, A. Fitzgibbon, S. Izadi","doi":"10.1109/CVPR.2014.146","DOIUrl":"https://doi.org/10.1109/CVPR.2014.146","url":null,"abstract":"We address the problem of estimating the pose of a cam- era relative to a known 3D scene from a single RGB-D frame. We formulate this problem as inversion of the generative rendering procedure, i.e., we want to find the camera pose corresponding to a rendering of the 3D scene model that is most similar with the observed input. This is a non-convex optimization problem with many local optima. We propose a hybrid discriminative-generative learning architecture that consists of: (i) a set of M predictors which generate M camera pose hypotheses, and (ii) a 'selector' or 'aggregator' that infers the best pose from the multiple pose hypotheses based on a similarity function. We are interested in predictors that not only produce good hypotheses but also hypotheses that are different from each other. Thus, we propose and study methods for learning 'marginally relevant' predictors, and compare their performance when used with different selection procedures. We evaluate our method on a recently released 3D reconstruction dataset with challenging camera poses, and scene variability. Experiments show that our method learns to make multiple predictions that are marginally relevant and can effectively select an accurate prediction. Furthermore, our method outperforms the state-of-the-art discriminative approach for camera relocalization.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123851126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities","authors":"Hilde Kuehne, A. B. Arslan, Thomas Serre","doi":"10.1109/CVPR.2014.105","DOIUrl":"https://doi.org/10.1109/CVPR.2014.105","url":null,"abstract":"This paper describes a framework for modeling human activities as temporally structured processes. Our approach is motivated by the inherently hierarchical nature of human activities and the close correspondence between human actions and speech: We model action units using Hidden Markov Models, much like words in speech. These action units then form the building blocks to model complex human activities as sentences using an action grammar. To evaluate our approach, we collected a large dataset of daily cooking activities: The dataset includes a total of 52 participants, each performing a total of 10 cooking activities in multiple real-life kitchens, resulting in over 77 hours of video footage. We evaluate the HTK toolkit, a state-of-the-art speech recognition engine, in combination with multiple video feature descriptors, for both the recognition of cooking activities (e.g., making pancakes) as well as the semantic parsing of videos into action units (e.g., cracking eggs). Our results demonstrate the benefits of structured temporal generative approaches over existing discriminative approaches in coping with the complexity of human daily life activities.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123877634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tell Me What You See and I Will Show You Where It Is","authors":"Jia Xu, A. Schwing, R. Urtasun","doi":"10.1109/CVPR.2014.408","DOIUrl":"https://doi.org/10.1109/CVPR.2014.408","url":null,"abstract":"We tackle the problem of weakly labeled semantic segmentation, where the only source of annotation are image tags encoding which classes are present in the scene. This is an extremely difficult problem as no pixel-wise labelings are available, not even at training time. In this paper, we show that this problem can be formalized as an instance of learning in a latent structured prediction framework, where the graphical model encodes the presence and absence of a class as well as the assignments of semantic labels to superpixels. As a consequence, we are able to leverage standard algorithms with good theoretical properties. We demonstrate the effectiveness of our approach using the challenging SIFT-flow dataset and show average per-class accuracy improvements of 7% over the state-of-the-art.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"24 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123214080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partial Symmetry in Polynomial Systems and Its Applications in Computer Vision","authors":"Yubin Kuang, Yinqiang Zheng, Kalle Åström","doi":"10.1109/CVPR.2014.63","DOIUrl":"https://doi.org/10.1109/CVPR.2014.63","url":null,"abstract":"Algorithms for solving systems of polynomial equations are key components for solving geometry problems in computer vision. Fast and stable polynomial solvers are essential for numerous applications e.g. minimal problems or finding for all stationary points of certain algebraic errors. Recently, full symmetry in the polynomial systems has been utilized to simplify and speed up state-of-the-art polynomial solvers based on Gröbner basis method. In this paper, we further explore partial symmetry (i.e. where the symmetry lies in a subset of the variables) in the polynomial systems. We develop novel numerical schemes to utilize such partial symmetry. We then demonstrate the advantage of our schemes in several computer vision problems. In both synthetic and real experiments, we show that utilizing partial symmetry allow us to obtain faster and more accurate polynomial solvers than the general solvers.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123348415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Laplacian Coordinates for Seeded Image Segmentation","authors":"Wallace Casaca, L. G. Nonato, G. Taubin","doi":"10.1109/CVPR.2014.56","DOIUrl":"https://doi.org/10.1109/CVPR.2014.56","url":null,"abstract":"Seed-based image segmentation methods have gained much attention lately, mainly due to their good performance in segmenting complex images with little user interaction. Such popularity leveraged the development of many new variations of seed-based image segmentation techniques, which vary greatly regarding mathematical formulation and complexity. Most existing methods in fact rely on complex mathematical formulations that typically do not guarantee unique solution for the segmentation problem while still being prone to be trapped in local minima. In this work we present a novel framework for seed-based image segmentation that is mathematically simple, easy to implement, and guaranteed to produce a unique solution. Moreover, the formulation holds an anisotropic behavior, that is, pixels sharing similar attributes are kept closer to each other while big jumps are naturally imposed on the boundary between image regions, thus ensuring better fitting on object boundaries. We show that the proposed framework outperform state-of-the-art techniques in terms of quantitative quality metrics as well as qualitative visual results.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121200051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding Matches in a Haystack: A Max-Pooling Strategy for Graph Matching in the Presence of Outliers","authors":"Minsu Cho, Jian Sun, Olivier Duchenne, J. Ponce","doi":"10.1109/CVPR.2014.268","DOIUrl":"https://doi.org/10.1109/CVPR.2014.268","url":null,"abstract":"A major challenge in real-world feature matching problems is to tolerate the numerous outliers arising in typical visual tasks. Variations in object appearance, shape, and structure within the same object class make it harder to distinguish inliers from outliers due to clutters. In this paper, we propose a max-pooling approach to graph matching, which is not only resilient to deformations but also remarkably tolerant to outliers. The proposed algorithm evaluates each candidate match using its most promising neighbors, and gradually propagates the corresponding scores to update the neighbors. As final output, it assigns a reliable score to each match together with its supporting neighbors, thus providing contextual information for further verification. We demonstrate the robustness and utility of our method with synthetic and real image experiments.","PeriodicalId":319578,"journal":{"name":"2014 IEEE Conference on Computer Vision and Pattern Recognition","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114114921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}