Lena Gorelick, Andrew Delong, O. Veksler, Yuri Boykov
{"title":"Recursive MDL via graph cuts: Application to segmentation","authors":"Lena Gorelick, Andrew Delong, O. Veksler, Yuri Boykov","doi":"10.1109/ICCV.2011.6126330","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126330","url":null,"abstract":"We propose a novel patch-based image representation that is useful because it (1) inherently detects regions with repetitive structure at multiple scales and (2) yields a parameterless hierarchical segmentation. We describe an image by breaking it into coherent regions where each region is well-described (easily reconstructed) by repeatedly instantiating a patch using a set of simple transformations. In other words, a good segment is one that has sufficient repetition of some pattern, and a patch is useful if it contains a pattern that is repeated in the image.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90448657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised learning of a scene-specific coarse gaze estimator","authors":"Ben Benfold, I. Reid","doi":"10.1109/ICCV.2011.6126516","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126516","url":null,"abstract":"We present a method to estimate the coarse gaze directions of people from surveillance data. Unlike previous work we aim to do this without recourse to a large hand-labelled corpus of training data. In contrast we propose a method for learning a classifier without any hand labelled data using only the output from an automatic tracking system. A Conditional Random Field is used to model the interactions between the head motion, walking direction, and appearance to recover the gaze directions and simultaneously train randomised decision tree classifiers. Experiments demonstrate performance exceeding that of conventionally trained classifiers on two large surveillance datasets.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83935576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discriminative figure-centric models for joint action localization and recognition","authors":"Tian Lan, Yang Wang, Greg Mori","doi":"10.1109/ICCV.2011.6126472","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126472","url":null,"abstract":"In this paper we develop an algorithm for action recognition and localization in videos. The algorithm uses a figure-centric visual word representation. Different from previous approaches it does not require reliable human detection and tracking as input. Instead, the person location is treated as a latent variable that is inferred simultaneously with action recognition. A spatial model for an action is learned in a discriminative fashion under a figure-centric representation. Temporal smoothness over video sequences is also enforced. We present results on the UCF-Sports dataset, verifying the effectiveness of our model in situations where detection and tracking of individuals is challenging.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72901223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Mahajan, Sundararajan Sellamanickam, Vinod Nair
{"title":"A joint learning framework for attribute models and object descriptions","authors":"D. Mahajan, Sundararajan Sellamanickam, Vinod Nair","doi":"10.1109/ICCV.2011.6126373","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126373","url":null,"abstract":"We present a new approach to learning attribute-based descriptions of objects. Unlike earlier works, we do not assume that the descriptions are hand-labeled. Instead, our approach jointly learns both the attribute classifiers and the descriptions from data. By incorporating class information into the attribute classifier learning, we get an attribute-level representation that generalizes well to both unseen examples of known classes and unseen classes. We consider two different settings, one with unlabeled images available for learning, and another without. The former corresponds to a novel transductive setting where the unlabeled images can come from new classes. Results from Animals with Attributes and a-Yahoo, a-Pascal benchmark datasets show that the learned representations give similar or even better accuracy than the hand-labeled descriptions.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77068419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lior Wolf, Lior Litwak, N. Dershowitz, Roni Shweka, Y. Choueka
{"title":"Active clustering of document fragments using information derived from both images and catalogs","authors":"Lior Wolf, Lior Litwak, N. Dershowitz, Roni Shweka, Y. Choueka","doi":"10.1109/ICCV.2011.6126428","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126428","url":null,"abstract":"Many significant historical corpora contain leaves that are mixed up and no longer bound in their original state as multi-page documents. The reconstruction of old manuscripts from a mix of disjoint leaves can therefore be of paramount importance to historians and literary scholars. Previously, it was shown that visual similarity provides meaningful pair-wise similarities between handwritten leaves. Here, we go a step further and suggest a semiautomatic clustering tool that helps reconstruct the original documents. The proposed solution is based on a graphical model that makes inferences based on catalog information provided for each leaf as well as on the pairwise similarities of handwriting. Several novel active clustering techniques are explored, and the solution is applied to a significant part of the Cairo Genizah, where the problem of joining leaves remains unsolved even after a century of extensive study by hundreds of scholars.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77122578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning equivariant structured output SVM regressors","authors":"A. Vedaldi, Matthew B. Blaschko, Andrew Zisserman","doi":"10.1109/ICCV.2011.6126339","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126339","url":null,"abstract":"Equivariance and invariance are often desired properties of a computer vision system. However, currently available strategies generally rely on virtual sampling, leaving open the question of how many samples are necessary, on the use of invariant feature representations, which can mistakenly discard information relevant to the vision task, or on the use of latent variable models, which result in non-convex training and expensive inference at test time. We propose here a generalization of structured output SVM regressors that can incorporate equivariance and invariance into a convex training procedure, enabling the incorporation of large families of transformations, while maintaining optimality and tractability. Importantly, test time inference does not require the estimation of latent variables, resulting in highly efficient objective functions. This results in a natural formulation for treating equivariance and invariance that is easily implemented as an adaptation of off-the-shelf optimization software, obviating the need for ad hoc sampling strategies. Theoretical results relating to vicinal risk, and experiments on challenging aerial car and pedestrian detection tasks show the effectiveness of the proposed solution.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85012249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaobai Liu, Xiao-Tong Yuan, Shuicheng Yan, Hai Jin
{"title":"Multi-class semi-supervised SVMs with Positiveness Exclusive Regularization","authors":"Xiaobai Liu, Xiao-Tong Yuan, Shuicheng Yan, Hai Jin","doi":"10.1109/ICCV.2011.6126399","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126399","url":null,"abstract":"In this work, we address the problem of multi-class classification problem in semi-supervised setting. A regularized multi-task learning approach is presented to train multiple binary-class Semi-Supervised Support Vector Machines (S3VMs) using the one-vs-rest strategy within a joint framework. A novel type of regularization, namely Positiveness Exclusive Regularization (PER), is introduced to induce the following prior: if an unlabeled sample receives significant positive response from one of the classifiers, it is less likely for this sample to receive positive responses from the other classifiers. That is, we expect an exclusive relationship among different S3VMs for evaluating the same unlabeled sample. We propose to use an ℓ1,2-norm regularizer as an implementation of PER. The objective of our approach is to minimize an empirical risk regularized by a PER term and a manifold regularization term. An efficient Nesterov-type smoothing approximation based method is developed for optimization. Evaluations with comparisons are conducted on several benchmarks for visual classification to demonstrate the advantages of the proposed method.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87352335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leonardo De-Maeztu, S. Mattoccia, A. Villanueva, R. Cabeza
{"title":"Linear stereo matching","authors":"Leonardo De-Maeztu, S. Mattoccia, A. Villanueva, R. Cabeza","doi":"10.1109/ICCV.2011.6126434","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126434","url":null,"abstract":"Recent local stereo matching algorithms based on an adaptive-weight strategy achieve accuracy similar to global approaches. One of the major problems of these algorithms is that they are computationally expensive and this complexity increases proportionally to the window size. This paper proposes a novel cost aggregation step with complexity independent of the window size (i.e. O(1)) that outperforms state-of-the-art O(1) methods. Moreover, compared to other O(1) approaches, our method does not rely on integral histograms enabling aggregation using colour images instead of grayscale ones. Finally, to improve the results of the proposed algorithm a disparity refinement pipeline is also proposed. The overall algorithm produces results comparable to those of state-of-the-art stereo matching algorithms.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84108777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Lobaton, Ramanarayan Vasudevan, R. Alterovitz, R. Bajcsy
{"title":"Robust topological features for deformation invariant image matching","authors":"E. Lobaton, Ramanarayan Vasudevan, R. Alterovitz, R. Bajcsy","doi":"10.1109/ICCV.2011.6126538","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126538","url":null,"abstract":"Local photometric descriptors are a crucial low level component of numerous computer vision algorithms. In practice, these descriptors are constructed to be invariant to a class of transformations. However, the development of a descriptor that is simultaneously robust to noise and invariant under general deformation has proven difficult. In this paper, we introduce the Topological-Attributed Relational Graph (T-ARG), a new local photometric descriptor constructed from homology that is provably invariant to locally bounded deformation. This new robust topological descriptor is backed by a formal mathematical framework. We apply T-ARG to a set of benchmark images to evaluate its performance. Results indicate that T-ARG significantly outperforms traditional descriptors for noisy, deforming images.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82827172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kinecting the dots: Particle based scene flow from depth sensors","authors":"Simon Hadfield, R. Bowden","doi":"10.1109/ICCV.2011.6126509","DOIUrl":"https://doi.org/10.1109/ICCV.2011.6126509","url":null,"abstract":"The motion field of a scene can be used for object segmentation and to provide features for classification tasks like action recognition. Scene flow is the full 3D motion field of the scene, and is more difficult to estimate than it's 2D counterpart, optical flow. Current approaches use a smoothness cost for regularisation, which tends to over-smooth at object boundaries. This paper presents a novel formulation for scene flow estimation, a collection of moving points in 3D space, modelled using a particle filter that supports multiple hypotheses and does not oversmooth the motion field. In addition, this paper is the first to address scene flow estimation, while making use of modern depth sensors and monocular appearance images, rather than traditional multi-viewpoint rigs. The algorithm is applied to an existing scene flow dataset, where it achieves comparable results to approaches utilising multiple views, while taking a fraction of the time.","PeriodicalId":6391,"journal":{"name":"2011 International Conference on Computer Vision","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83243789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}