{"title":"A Visual Vocabulary for Flower Classification","authors":"M. Nilsback, Andrew Zisserman","doi":"10.1109/CVPR.2006.42","DOIUrl":"https://doi.org/10.1109/CVPR.2006.42","url":null,"abstract":"We investigate to what extent ‘bag of visual words’ models can be used to distinguish categories which have significant visual similarity. To this end we develop and optimize a nearest neighbour classifier architecture, which is evaluated on a very challenging database of flower images. The flower categories are chosen to be indistinguishable on colour alone (for example), and have considerable variation in shape, scale, and viewpoint. We demonstrate that by developing a visual vocabulary that explicitly represents the various aspects (colour, shape, and texture) that distinguish one flower from another, we can overcome the ambiguities that exist between flower categories. The novelty lies in the vocabulary used for each aspect, and how these vocabularies are combined into a final classifier. The various stages of the classifier (vocabulary selection and combination) are each optimized on a validation set. Results are presented on a dataset of 1360 images consisting of 17 flower species. It is shown that excellent performance can be achieved, far surpassing standard baseline algorithms using (for example) colour cues alone.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134144301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kernel-based Template Alignment","authors":"I. Guskov","doi":"10.1109/CVPR.2006.162","DOIUrl":"https://doi.org/10.1109/CVPR.2006.162","url":null,"abstract":"This paper introduces a novel kernel-based method for template tracking in video sequences. The method is derived for a general warping transformation, and its application to affine motion tracking is further explored. Our approach is based on maximization of the multi-kernel Bhattacharyya coefficient with respect to the warp parameters. We explicitly compute the gradient of the similarity functional, and use a quasi-Newton procedure for optimization. Additionally, we consider a simple extension of the method that employs an illumination model correction to allow tracking under varying lighting conditions. The resulting tracking procedure is evaluated on a number of examples including large templates tracking non-rigidly moving textured areas.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132946571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Surface Geometric Constraints for Stereo in Belief Propagation","authors":"Gang Li, S. Zucker","doi":"10.1109/CVPR.2006.299","DOIUrl":"https://doi.org/10.1109/CVPR.2006.299","url":null,"abstract":"Belief propagation has been shown to be a powerful inference mechanism for stereo correspondence. However the classical formulation of belief propagation implicitly imposes the frontal parallel plane assumption in the compatibility matrix for exploiting contextual information, since the priors perfer no depth (disparity) change in surrounding neighborhoods. This results in systematic errors for slanted or curved surfaces. To eliminate these errors we propose to use contextual information geometrically, and show how to encode surface differential geometric properties in the compatibility matrix for stereo correspondence. This enforces consistency for both depth and surface normal, extending the traditional formulation beyond consistency for (constant) depth. With such geometric contextual information, the belief propagation algorithm shows dramatic improvement on generic non-frontal parallel scenes. Several such examples are provided.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132086455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects","authors":"J. Winn, J. Shotton","doi":"10.1109/CVPR.2006.305","DOIUrl":"https://doi.org/10.1109/CVPR.2006.305","url":null,"abstract":"This paper addresses the problem of detecting and segmenting partially occluded objects of a known category. We first define a part labelling which densely covers the object. Our Layout Consistent Random Field (LayoutCRF) model then imposes asymmetric local spatial constraints on these labels to ensure the consistent layout of parts whilst allowing for object deformation. Arbitrary occlusions of the object are handled by avoiding the assumption that the whole object is visible. The resulting system is both efficient to train and to apply to novel images, due to a novel annealed layout-consistent expansion move algorithm paired with a randomised decision tree classifier. We apply our technique to images of cars and faces and demonstrate state-of-the-art detection and segmentation performance even in the presence of partial occlusion.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133244953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Separation of Highlight Reflections on Textured Surfaces","authors":"P. Tan, Long Quan, Stephen Lin","doi":"10.1109/CVPR.2006.273","DOIUrl":"https://doi.org/10.1109/CVPR.2006.273","url":null,"abstract":"We present a method for separating highlight reflections on textured surfaces. In contrast to previous techniques that use diffuse color information from outside the highlight area to constrain the solution, the proposed method further capitalizes on the spatial distributions of colors to resolve ambiguities in separation that often arise in real images. For highlight pixels in which a clear-cut separation cannot be determined from color space analysis, we evaluate possible separation solutions based on their consistency with diffuse texture characteristics outside the highlight. With consideration of color distributions in both the color space and the image space, appreciably enhanced separation performance can be attained in challenging cases.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128914469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bilayer Segmentation of Live Video","authors":"A. Criminisi, G. Cross, A. Blake, V. Kolmogorov","doi":"10.1109/CVPR.2006.69","DOIUrl":"https://doi.org/10.1109/CVPR.2006.69","url":null,"abstract":"This paper presents an algorithm capable of real-time separation of foreground from background in monocular video sequences. Automatic segmentation of layers from colour/contrast or from motion alone is known to be error-prone. Here motion, colour and contrast cues are probabilistically fused together with spatial and temporal priors to infer layers accurately and efficiently. Central to our algorithm is the fact that pixel velocities are not needed, thus removing the need for optical flow estimation, with its tendency to error and computational expense. Instead, an efficient motion vs nonmotion classifier is trained to operate directly and jointly on intensity-change and contrast. Its output is then fused with colour information. The prior on segmentation is represented by a second order, temporal, Hidden Markov Model, together with a spatial MRF favouring coherence except where contrast is high. Finally, accurate layer segmentation and explicit occlusion detection are efficiently achieved by binary graph cut. The segmentation accuracy of the proposed algorithm is quantitatively evaluated with respect to existing groundtruth data and found to be comparable to the accuracy of a state of the art stereo segmentation algorithm. Foreground/ background segmentation is demonstrated in the application of live background substitution and shown to generate convincingly good quality composite video.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134461925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating Intrinsic Component Images using Non-Linear Regression","authors":"M. Tappen, E. Adelson, W. Freeman","doi":"10.1109/CVPR.2006.114","DOIUrl":"https://doi.org/10.1109/CVPR.2006.114","url":null,"abstract":"Images can be represented as the composition of multiple intrinsic component images, such as shading, albedo, and noise images. In this paper, we present a method for estimating intrinsic component images from a single image, which we apply to the problems of estimating shading and albedo images and image denoising. Our method is based on learning estimators that predict filtered versions of the desired image. Unlike previous approaches, our method does not require unnatural discretizations of the problem. We also demonstrate how to learn a weighting function that properly weights the local estimates when constructing the estimated image. For shading estimation, we introduce a new training set of real-world images. The accuracy of our method is measured both qualitatively and quantitatively, showing better performance on the shading/albedo separation problem than previous approaches. The performance on denoising is competitive with the current state of the art.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131843079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pursuing Informative Projection on Grassmann Manifold","authors":"Dahua Lin, Shuicheng Yan, Xiaoou Tang","doi":"10.1109/CVPR.2006.231","DOIUrl":"https://doi.org/10.1109/CVPR.2006.231","url":null,"abstract":"Inspired by the underlying relationship between classification capability and the mutual information, in this paper, we first establish a quantitative model to describe the information transmission process from feature extraction to final classification and identify the critical channel in this propagation path, and then propose a Maximum Effective Information Criteria for pursuing the optimal subspace in the sense of preserving maximum information that can be conveyed to final decision. Considering the orthogonality and rotation invariance properties of the solution space, we present a Conjugate Gradient method constrained on a Grassmann manifold to exploit the geometric traits of the solution space for enhancing the efficiency of optimization. Comprehensive experiments demonstrate that the framework integrating the Maximum Effective Information Criteria and Grassmann manifold-based optimization method significantly improves the classification performance.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127419614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Building Detection and Modeling from Aerial LIDAR Data","authors":"Vivek Verma, Rakesh Kumar, S. Hsu","doi":"10.1109/CVPR.2006.12","DOIUrl":"https://doi.org/10.1109/CVPR.2006.12","url":null,"abstract":"This paper presents a method to detect and construct a 3D geometric model of an urban area with complex buildings using aerial LIDAR (Light Detection and Ranging) data. The LIDAR data collected from a nadir direction is a point cloud containing surface samples of not only the building roofs and terrain but also undesirable clutter from trees, cars, etc. The main contribution of this work is the automatic recognition and estimation of simple parametric shapes that can be combined to model very complex buildings from aerial LIDAR data. The main components of the detection and modeling algorithms are (i) Segmentation of roof and terrain points. (ii) Roof topology Inference. We introduce the concept of a roof-topology graph to represent the relationships between the various planar patches of a complex roof structure. (iii) Parametric roof composition. Simple parametric roof shapes that can be combined to create a complex roof structure of a building are recognized by searching for sub-graphs in its roof-topology graph. (iv) Terrain Modeling. The terrain is identified and modeled as a triangulated mesh. Finally, we provide experimental results that demonstrate the validity of our approach for rapid and automatic building detection and geometric modeling with real LIDAR data. We are able to model cities and other urban areas at the rate of about 10 minutes per sq. mile on a low-end PC.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133811670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Segmentation by Level Sets and Symmetry","authors":"Tammy Riklin-Raviv, N. Kiryati, N. Sochen","doi":"10.1109/CVPR.2006.270","DOIUrl":"https://doi.org/10.1109/CVPR.2006.270","url":null,"abstract":"Shape symmetry is an important cue for image understanding. In the absence of more detailed prior shape information, segmentation can be significantly facilitated by symmetry. However, when symmetry is distorted by perspectivity, the detection of symmetry becomes non-trivial, thus complicating symmetry-aided segmentation. We present an original approach for segmentation of symmetrical objects accommodating perspective distortion. The key idea is the use of the replicative form induced by the symmetry for challenging segmentation tasks. This is accomplished by dynamic extraction of the object boundaries, based on the image gradients, gray levels or colors, concurrently with registration of the image symmetrical counterpart (e.g. reflection) to itself. The symmetrical counterpart of the evolving object contour supports the segmentation by resolving possible ambiguities due to noise, clutter, distortion, shadows, occlusions and assimilation with the background. The symmetry constraint is integrated in a comprehensive level-set functional for segmentation that determines the evolution of the delineating contour. The proposed framework is exemplified on various images of skewsymmetrical objects and its superiority over state of the art variational segmentation techniques is demonstrated.","PeriodicalId":421737,"journal":{"name":"2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115765025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}