{"title":"Learning SURF Cascade for Fast and Accurate Object Detection","authors":"Jianguo Li, Yimin Zhang","doi":"10.1109/CVPR.2013.445","DOIUrl":"https://doi.org/10.1109/CVPR.2013.445","url":null,"abstract":"This paper presents a novel learning framework for training boosting cascade based object detector from large scale dataset. The framework is derived from the well-known Viola-Jones (VJ) framework but distinguished by three key differences. First, the proposed framework adopts multi-dimensional SURF features instead of single dimensional Haar features to describe local patches. In this way, the number of used local patches can be reduced from hundreds of thousands to several hundreds. Second, it adopts logistic regression as weak classifier for each local patch instead of decision trees in the VJ framework. Third, we adopt AUC as a single criterion for the convergence test during cascade training rather than the two trade-off criteria (false-positive-rate and hit-rate) in the VJ framework. The benefit is that the false-positive-rate can be adaptive among different cascade stages, and thus yields much faster convergence speed of SURF cascade. Combining these points together, the proposed approach has three good properties. First, the boosting cascade can be trained very efficiently. Experiments show that the proposed approach can train object detectors from billions of negative samples within one hour even on personal computers. Second, the built detector is comparable to the state-of-the-art algorithm not only on the accuracy but also on the processing speed. Third, the built detector is small in model-size due to short cascade stages.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"15 1","pages":"3468-3475"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73272165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Articulated and Restricted Motion Subspaces and Their Signatures","authors":"Bastien Jacquet, Roland Angst, M. Pollefeys","doi":"10.1109/CVPR.2013.198","DOIUrl":"https://doi.org/10.1109/CVPR.2013.198","url":null,"abstract":"Articulated objects represent an important class of objects in our everyday environment. Automatic detection of the type of articulated or otherwise restricted motion and extraction of the corresponding motion parameters are therefore of high value, eg in order to augment an otherwise static 3D reconstruction with dynamic semantics, such as rotation axes and allowable translation directions for certain rigid parts or objects. Hence, in this paper, a novel theory to analyse relative transformations between two motion-restricted parts will be presented. The analysis is based on linear subspaces spanned by relative transformations. Moreover, a signature for relative transformations will be introduced which uniquely specifies the type of restricted motion encoded in these relative transformations. This theoretic framework enables the derivation of novel algebraic constraints, such as low-rank constraints for subsequent rotations around two fixed axes for example. Lastly, given the type of restricted motion as predicted by the signature, the paper shows how to extract all the motion parameters with matrix manipulations from linear algebra. Our theory is verified on several real data sets, such as a rotating blackboard or a wheel rolling on the floor amongst others.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"1 1","pages":"1506-1513"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72607878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin Karsch, Zicheng Liao, Jason Rock, Jonathan T. Barron, Derek Hoiem
{"title":"Boundary Cues for 3D Object Shape Recovery","authors":"Kevin Karsch, Zicheng Liao, Jason Rock, Jonathan T. Barron, Derek Hoiem","doi":"10.1109/CVPR.2013.281","DOIUrl":"https://doi.org/10.1109/CVPR.2013.281","url":null,"abstract":"Early work in computer vision considered a host of geometric cues for both shape reconstruction and recognition. However, since then, the vision community has focused heavily on shading cues for reconstruction, and moved towards data-driven approaches for recognition. In this paper, we reconsider these perhaps overlooked \"boundary\" cues (such as self occlusions and folds in a surface), as well as many other established constraints for shape reconstruction. In a variety of user studies and quantitative tasks, we evaluate how well these cues inform shape reconstruction (relative to each other) in terms of both shape quality and shape recognition. Our findings suggest many new directions for future research in shape reconstruction, such as automatic boundary cue detection and relaxing assumptions in shape from shading (e.g. orthographic projection, Lambertian surfaces).","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"49 1","pages":"2163-2170"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73395677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianzhou Yan, Stephen Lin, S. B. Kang, Xiaoou Tang
{"title":"Learning the Change for Automatic Image Cropping","authors":"Jianzhou Yan, Stephen Lin, S. B. Kang, Xiaoou Tang","doi":"10.1109/CVPR.2013.130","DOIUrl":"https://doi.org/10.1109/CVPR.2013.130","url":null,"abstract":"Image cropping is a common operation used to improve the visual quality of photographs. In this paper, we present an automatic cropping technique that accounts for the two primary considerations of people when they crop: removal of distracting content, and enhancement of overall composition. Our approach utilizes a large training set consisting of photos before and after cropping by expert photographers to learn how to evaluate these two factors in a crop. In contrast to the many methods that exist for general assessment of image quality, ours specifically examines differences between the original and cropped photo in solving for the crop parameters. To this end, several novel image features are proposed to model the changes in image content and composition when a crop is applied. Our experiments demonstrate improvements of our method over recent cropping algorithms on a broad range of images.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"26 1","pages":"971-978"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79227569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Beta Process Joint Dictionary Learning for Coupled Feature Spaces with Application to Single Image Super-Resolution","authors":"Li He, H. Qi, R. Zaretzki","doi":"10.1109/CVPR.2013.51","DOIUrl":"https://doi.org/10.1109/CVPR.2013.51","url":null,"abstract":"This paper addresses the problem of learning over-complete dictionaries for the coupled feature spaces, where the learned dictionaries also reflect the relationship between the two spaces. A Bayesian method using a beta process prior is applied to learn the over-complete dictionaries. Compared to previous couple feature spaces dictionary learning algorithms, our algorithm not only provides dictionaries that customized to each feature space, but also adds more consistent and accurate mapping between the two feature spaces. This is due to the unique property of the beta process model that the sparse representation can be decomposed to values and dictionary atom indicators. The proposed algorithm is able to learn sparse representations that correspond to the same dictionary atoms with the same sparsity but different values in coupled feature spaces, thus bringing consistent and accurate mapping between coupled feature spaces. Another advantage of the proposed method is that the number of dictionary atoms and their relative importance may be inferred non-parametrically. We compare the proposed approach to several state-of-the-art dictionary learning methods by applying this method to single image super-resolution. The experimental results show that dictionaries learned by our method produces the best super-resolution results compared to other state-of-the-art methods.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"14 1","pages":"345-352"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84913267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding Things: Image Parsing with Regions and Per-Exemplar Detectors","authors":"Joseph Tighe, S. Lazebnik","doi":"10.1109/CVPR.2013.386","DOIUrl":"https://doi.org/10.1109/CVPR.2013.386","url":null,"abstract":"This paper presents a system for image parsing, or labeling each pixel in an image with its semantic category, aimed at achieving broad coverage across hundreds of object categories, many of them sparsely sampled. The system combines region-level features with per-exemplar sliding window detectors. Per-exemplar detectors are better suited for our parsing task than traditional bounding box detectors: they perform well on classes with little training data and high intra-class variation, and they allow object masks to be transferred into the test image for pixel-level segmentation. The proposed system achieves state-of-the-art accuracy on three challenging datasets, the largest of which contains 45,676 images and 232 labels.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"11 1","pages":"3001-3008"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84972961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Non-parametric Framework for Document Bleed-through Removal","authors":"Róisín Rowley-Brooke, François Pitié, A. Kokaram","doi":"10.1109/CVPR.2013.380","DOIUrl":"https://doi.org/10.1109/CVPR.2013.380","url":null,"abstract":"This paper presents recent work on a new framework for non-blind document bleed-through removal. The framework includes image preprocessing to remove local intensity variations, pixel region classification based on a segmentation of the joint recto-verso intensity histogram and connected component analysis on the subsequent image labelling. Finally restoration of the degraded regions is performed using exemplar-based image in painting. The proposed method is evaluated visually and numerically on a freely available database of 25 scanned manuscript image pairs with ground truth, and is shown to outperform recent non-blind bleed-through removal techniques.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"15 1","pages":"2954-2960"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85238444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single Image Calibration of Multi-axial Imaging Systems","authors":"Amit K. Agrawal, S. Ramalingam","doi":"10.1109/CVPR.2013.184","DOIUrl":"https://doi.org/10.1109/CVPR.2013.184","url":null,"abstract":"Imaging systems consisting of a camera looking at multiple spherical mirrors (reflection) or multiple refractive spheres (refraction) have been used for wide-angle imaging applications. We describe such setups as multi-axial imaging systems, since a single sphere results in an axial system. Assuming an internally calibrated camera, calibration of such multi-axial systems involves estimating the sphere radii and locations in the camera coordinate system. However, previous calibration approaches require manual intervention or constrained setups. We present a fully automatic approach using a single photo of a 2D calibration grid. The pose of the calibration grid is assumed to be unknown and is also recovered. Our approach can handle unconstrained setups, where the mirrors/refractive balls can be arranged in any fashion, not necessarily on a grid. The axial nature of rays allows us to compute the axis of each sphere separately. We then show that by choosing rays from two or more spheres, the unknown pose of the calibration grid can be obtained linearly and independently of sphere radii and locations. Knowing the pose, we derive analytical solutions for obtaining the sphere radius and location. This leads to an interesting result that 6-DOF pose estimation of a multi-axial camera can be done without the knowledge of full calibration. Simulations and real experiments demonstrate the applicability of our algorithm.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"291 1","pages":"1399-1406"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85401547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In Defense of Sparsity Based Face Recognition","authors":"Weihong Deng, Jiani Hu, Jun Guo","doi":"10.1109/CVPR.2013.58","DOIUrl":"https://doi.org/10.1109/CVPR.2013.58","url":null,"abstract":"The success of sparse representation based classification (SRC) has largely boosted the research of sparsity based face recognition in recent years. A prevailing view is that the sparsity based face recognition performs well only when the training images have been carefully controlled and the number of samples per class is sufficiently large. This paper challenges the prevailing view by proposing a ``prototype plus variation'' representation model for sparsity based face recognition. Based on the new model, a Superposed SRC (SSRC), in which the dictionary is assembled by the class centroids and the sample-to-centroid differences, leads to a substantial improvement on SRC. The experiments results on AR, FERET and FRGC databases validate that, if the proposed prototype plus variation representation model is applied, sparse coding plays a crucial role in face recognition, and performs well even when the dictionary bases are collected under uncontrolled conditions and only a single sample per classes is available.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"11 1","pages":"399-406"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81932675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Pose Robust Face Recognition","authors":"Dong Yi, Zhen Lei, S. Li","doi":"10.1109/CVPR.2013.454","DOIUrl":"https://doi.org/10.1109/CVPR.2013.454","url":null,"abstract":"Most existing pose robust methods are too computational complex to meet practical applications and their performance under unconstrained environments are rarely evaluated. In this paper, we propose a novel method for pose robust face recognition towards practical applications, which is fast, pose robust and can work well under unconstrained environments. Firstly, a 3D deformable model is built and a fast 3D model fitting algorithm is proposed to estimate the pose of face image. Secondly, a group of Gabor filters are transformed according to the pose and shape of face image for feature extraction. Finally, PCA is applied on the pose adaptive Gabor features to remove the redundances and Cosine metric is used to evaluate the similarity. The proposed method has three advantages: (1) The pose correction is applied in the filter space rather than image space, which makes our method less affected by the precision of the 3D model, (2) By combining the holistic pose transformation and local Gabor filtering, the final feature is robust to pose and other negative factors in face recognition, (3) The 3D structure and facial symmetry are successfully used to deal with self-occlusion. Extensive experiments on FERET and PIE show the proposed method outperforms state-of-the-art methods significantly, meanwhile, the method works well on LFW.","PeriodicalId":6343,"journal":{"name":"2013 IEEE Conference on Computer Vision and Pattern Recognition","volume":"18 1","pages":"3539-3545"},"PeriodicalIF":0.0,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82200654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}