{"title":"Patch-based Image Correlation with Rapid Filtering","authors":"G. Guo, C. Dyer","doi":"10.1109/CVPR.2007.383373","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383373","url":null,"abstract":"This paper describes a patch-based approach for rapid image correlation or template matching. By representing a template image with an ensemble of patches, the method is robust with respect to variations such as local appearance variation, partial occlusion, and scale changes. Rectangle filters are applied to each image patch for fast filtering based on the integral image representation. A new method is developed for feature dimension reduction by detecting the \"salient\" image structures given a single image. Experiments on a variety images show the success of the method in dealing with different variations in the test images. In terms of computation time, the approach is faster than traditional methods by up to two orders of magnitude and is at least three times faster than a fast implementation of normalized cross correlation.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125605215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sensor noise modeling using the Skellam distribution: Application to the color edge detection","authors":"Youngbae Hwang, Jun-Sik Kim, In-So Kweon","doi":"10.1109/CVPR.2007.383004","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383004","url":null,"abstract":"In this paper, we introduce the Skellam distribution as a sensor noise model for CCD or CMOS cameras. This is derived from the Poisson distribution of photons that determine the sensor response. We show that the Skellam distribution can be used to measure the intensity difference of pixels in the spatial domain, as well as in the temporal domain. In addition, we show that Skellam parameters are linearly related to the intensity of the pixels. This property means that the brighter pixels tolerate greater variation of intensity than the darker pixels. This enables us to decide automatically whether two pixels have different colors. We apply this modeling to detect the edges in color images. The resulting algorithm requires only a confidence interval for a hypothesis test, because it uses the distribution of image noise directly. More importantly, we demonstrate that without conventional Gaussian smoothing the noise model-based approach can automatically extract the fine details of image structures, such as edges and corners, independent of camera setting.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"301 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122636612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pose and Illumination Invariant Face Recognition in Video","authors":"Yilei Xu, A. Roy-Chowdhury, Keyur Patel","doi":"10.1109/CVPR.2007.383376","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383376","url":null,"abstract":"The use of video sequences for face recognition has been relatively less studied than image-based approaches. In this paper, we present a framework for face recognition from video sequences that is robust to large changes in facial pose and lighting conditions. Our method is based on a recently obtained theoretical result that can integrate the effects of motion, lighting and shape in generating an image using a perspective camera. This result can be used to estimate the pose and illumination conditions for each frame of the probe sequence. Then, using a 3D face model, we synthesize images corresponding to the pose and illumination conditions estimated in the probe sequences. Similarity between the synthesized images and the probe video is computed by integrating over the entire sequence. The method can handle situations where the pose and lighting conditions in the training and testing data are completely disjoint.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122990765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discriminative Cluster Refinement: Improving Object Category Recognition Given Limited Training Data","authors":"Liu Yang, Rong Jin, C. Pantofaru, R. Sukthankar","doi":"10.1109/CVPR.2007.383270","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383270","url":null,"abstract":"A popular approach to problems in image classification is to represent the image as a bag of visual words and then employ a classifier to categorize the image. Unfortunately, a significant shortcoming of this approach is that the clustering and classification are disconnected. Since the clustering into visual words is unsupervised, the representation does not necessarily capture the aspects of the data that are most useful for classification. More seriously, the semantic relationship between clusters is lost, causing the overall classification performance to suffer. We introduce \"discriminative cluster refinement\" (DCR), a method that explicitly models the pairwise relationships between different visual words by exploiting their co-occurrence information. The assigned class labels are used to identify the co-occurrence patterns that are most informative for object classification. DCR employs a maximum-margin approach to generate an optimal kernel matrix for classification. One important benefit of DCR is that it integrates smoothly into existing bag-of-words information retrieval systems by employing the set of visual words generated by any clustering method. While DCR could improve a broad class of information retrieval systems, this paper focuses on object category recognition. We present a direct comparison with a state-of-the art method on the PASCAL 2006 database and show that cluster refinement results in a significant improvement in classification accuracy given a small number of training examples.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123008055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Performance Prediction and Validation for Multisensor Fusion","authors":"Rong Wang, B. Bhanu","doi":"10.1109/CVPR.2007.383112","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383112","url":null,"abstract":"Multiple sensors are commonly fused to improve the detection and recognition performance of computer vision and pattern recognition systems. The traditional approach to determine the optimal sensor combination is to try all possible sensor combinations by performing exhaustive experiments. In this paper, we present a theoretical approach that predicts the performance of sensor fusion that allows us to select the optimal combination. We start with the characteristics of each sensor by computing the match score and non-match score distributions of objects to be recognized. These distributions are modeled as a mixture of Gaussians. Then, we use an explicit Phi transformation that maps a receiver operating characteristic (ROC) curve to a straight line in 2-D space whose axes are related to the false alarm rate (FAR) and the Hit rate (Hit). Finally, using this representation, we derive a set of metrics to evaluate the sensor fusion performance and find the optimal sensor combination. We verify our prediction approach on the publicly available XM2VTS database as well as other databases.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131564386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Distance Metric Learning for Clustering","authors":"Jieping Ye, Zheng Zhao, Huan Liu","doi":"10.1109/CVPR.2007.383103","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383103","url":null,"abstract":"A good distance metric is crucial for unsupervised learning from high-dimensional data. To learn a metric without any constraint or class label information, most unsupervised metric learning algorithms appeal to projecting observed data onto a low-dimensional manifold, where geometric relationships such as local or global pairwise distances are preserved. However, the projection may not necessarily improve the separability of the data, which is the desirable outcome of clustering. In this paper, we propose a novel unsupervised adaptive metric learning algorithm, called AML, which performs clustering and distance metric learning simultaneously. AML projects the data onto a low-dimensional manifold, where the separability of the data is maximized. We show that the joint clustering and distance metric learning can be formulated as a trace maximization problem, which can be solved via an iterative procedure in the EM framework. Experimental results on a collection of benchmark data sets demonstrated the effectiveness of the proposed algorithm.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127626591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human Detection via Classification on Riemannian Manifolds","authors":"Oncel Tuzel, F. Porikli, P. Meer","doi":"10.1109/CVPR.2007.383197","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383197","url":null,"abstract":"We present a new algorithm to detect humans in still images utilizing covariance matrices as object descriptors. Since these descriptors do not lie on a vector space, well known machine learning techniques are not adequate to learn the classifiers. The space of d-dimensional nonsingular covariance matrices can be represented as a connected Riemannian manifold. We present a novel approach for classifying points lying on a Riemannian manifold by incorporating the a priori information about the geometry of the space. The algorithm is tested on INRIA human database where superior detection rates are observed over the previous approaches.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128081196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Bocquillon, A. Bartoli, Pierre Gurdjos, Alain Crouzil
{"title":"On Constant Focal Length Self-Calibration From Multiple Views","authors":"B. Bocquillon, A. Bartoli, Pierre Gurdjos, Alain Crouzil","doi":"10.1109/CVPR.2007.383066","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383066","url":null,"abstract":"We investigate the problem of finding the metric structure of a general 3D scene viewed by a moving camera with square pixels and constant unknown focal length. While the problem has a concise and well-understood formulation in the stratified framework thanks to the absolute dual quadric, two open issues remain. The first issue concerns the generic Critical Motion Sequences, i.e. camera motions for which self-calibration is ambiguous. Most of the previous work focuses on the varying focal length case. We provide a thorough study of the constant focal length case. The second issue is to solve the nonlinear set of equations in four unknowns arising from the dual quadric formulation. Most of the previous work either does local nonlinear optimization, thereby requiring an initial solution, or linearizes the problem, which introduces artificial degeneracies, most of which likely to arise in practice. We use interval analysis to solve this problem. The resulting algorithm is guaranteed to find the solution and is not subject to artificial degeneracies. Directly using interval analysis usually results in computationally expensive algorithms. We propose a carefully chosen set of inclusion functions, making it possible to find the solution within few seconds. Comparisons of the proposed algorithm with existing ones are reported for simulated and real data.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132560289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deng Cai, Xiaofei He, Yuxiao Hu, Jiawei Han, Thomas S. Huang
{"title":"Learning a Spatially Smooth Subspace for Face Recognition","authors":"Deng Cai, Xiaofei He, Yuxiao Hu, Jiawei Han, Thomas S. Huang","doi":"10.1109/CVPR.2007.383054","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383054","url":null,"abstract":"Subspace learning based face recognition methods have attracted considerable interests in recently years, including principal component analysis (PCA), linear discriminant analysis (LDA), locality preserving projection (LPP), neighborhood preserving embedding (NPE), marginal fisher analysis (MFA) and local discriminant embedding (LDE). These methods consider an n1timesn2 image as a vector in Rn 1 timesn 2 and the pixels of each image are considered as independent. While an image represented in the plane is intrinsically a matrix. The pixels spatially close to each other may be correlated. Even though we have n1xn2 pixels per image, this spatial correlation suggests the real number of freedom is far less. In this paper, we introduce a regularized subspace learning model using a Laplacian penalty to constrain the coefficients to be spatially smooth. All these existing subspace learning algorithms can fit into this model and produce a spatially smooth subspace which is better for image representation than their original version. Recognition, clustering and retrieval can be then performed in the image subspace. Experimental results on face recognition demonstrate the effectiveness of our method.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132744373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised Learning of Image Transformations","authors":"R. Memisevic, Geoffrey E. Hinton","doi":"10.1109/CVPR.2007.383036","DOIUrl":"https://doi.org/10.1109/CVPR.2007.383036","url":null,"abstract":"We describe a probabilistic model for learning rich, distributed representations of image transformations. The basic model is defined as a gated conditional random field that is trained to predict transformations of its inputs using a factorial set of latent variables. Inference in the model consists in extracting the transformation, given a pair of images, and can be performed exactly and efficiently. We show that, when trained on natural videos, the model develops domain specific motion features, in the form of fields of locally transformed edge filters. When trained on affine, or more general, transformations of still images, the model develops codes for these transformations, and can subsequently perform recognition tasks that are invariant under these transformations. It can also fantasize new transformations on previously unseen images. We describe several variations of the basic model and provide experimental results that demonstrate its applicability to a variety of tasks.","PeriodicalId":351008,"journal":{"name":"2007 IEEE Conference on Computer Vision and Pattern Recognition","volume":"48 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132810568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}