Saining Xie, Tianbao Yang, Xiaoyu Wang, Yuanqing Lin
{"title":"Hyper-class augmented and regularized deep learning for fine-grained image classification","authors":"Saining Xie, Tianbao Yang, Xiaoyu Wang, Yuanqing Lin","doi":"10.1109/CVPR.2015.7298880","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298880","url":null,"abstract":"Deep convolutional neural networks (CNN) have seen tremendous success in large-scale generic object recognition. In comparison with generic object recognition, fine-grained image classification (FGIC) is much more challenging because (i) fine-grained labeled data is much more expensive to acquire (usually requiring domain expertise); (ii) there exists large intra-class and small inter-class variance. Most recent work exploiting deep CNN for image recognition with small training data adopts a simple strategy: pre-train a deep CNN on a large-scale external dataset (e.g., ImageNet) and fine-tune on the small-scale target data to fit the specific classification task. In this paper, beyond the fine-tuning strategy, we propose a systematic framework of learning a deep CNN that addresses the challenges from two new perspectives: (i) identifying easily annotated hyper-classes inherent in the fine-grained data and acquiring a large number of hyper-class-labeled images from readily available external sources (e.g., image search engines), and formulating the problem into multitask learning; (ii) a novel learning model by exploiting a regularization between the fine-grained recognition model and the hyper-class recognition model. We demonstrate the success of the proposed framework on two small-scale fine-grained datasets (Stanford Dogs and Stanford Cars) and on a large-scale car dataset that we collected.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"297 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122982063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient illuminant estimation for color constancy using grey pixels","authors":"Kai-Fu Yang, Shaobing Gao, Yongjie Li","doi":"10.1109/CVPR.2015.7298838","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298838","url":null,"abstract":"Illuminant estimation is a key step for computational color constancy. Instead of using the grey world or grey edge assumptions, we propose in this paper a novel method for illuminant estimation by using the information of grey pixels detected in a given color-biased image. The underlying hypothesis is that most of the natural images include some detectable pixels that are at least approximately grey, which can be reliably utilized for illuminant estimation. We first validate our assumption through comprehensive statistical evaluation on diverse collection of datasets and then put forward a novel grey pixel detection method based on the illuminant-invariant measure (IIM) in three logarithmic color channels. Then the light source color of a scene can be easily estimated from the detected grey pixels. Experimental results on four benchmark datasets (three recorded under single illuminant and one under multiple illuminants) show that the proposed method outperforms most of the state-of-the-art color constancy approaches with the inherent merit of low computational cost.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"427 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115652728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongliang Cheng, Brian L. Price, Scott D. Cohen, M. S. Brown
{"title":"Effective learning-based illuminant estimation using simple features","authors":"Dongliang Cheng, Brian L. Price, Scott D. Cohen, M. S. Brown","doi":"10.1109/CVPR.2015.7298702","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298702","url":null,"abstract":"Illumination estimation is the process of determining the chromaticity of the illumination in an imaged scene in order to remove undesirable color casts through white-balancing. While computational color constancy is a well-studied topic in computer vision, it remains challenging due to the ill-posed nature of the problem. One class of techniques relies on low-level statistical information in the image color distribution and works under various assumptions (e.g. Grey-World, White-Patch, etc). These methods have an advantage that they are simple and fast, but often do not perform well. More recent state-of-the-art methods employ learning-based techniques that produce better results, but often rely on complex features and have long evaluation and training times. In this paper, we present a learning-based method based on four simple color features and show how to use this with an ensemble of regression trees to estimate the illumination. We demonstrate that our approach is not only faster than existing learning-based methods in terms of both evaluation and training time, but also gives the best results reported to date on modern color constancy data sets.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117149968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alina Kuznetsova, Sung Ju Hwang, B. Rosenhahn, L. Sigal
{"title":"Expanding object detector's Horizon: Incremental learning framework for object detection in videos","authors":"Alina Kuznetsova, Sung Ju Hwang, B. Rosenhahn, L. Sigal","doi":"10.1109/CVPR.2015.7298597","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298597","url":null,"abstract":"Over the last several years it has been shown that image-based object detectors are sensitive to the training data and often fail to generalize to examples that fall outside the original training sample domain (e.g., videos). A number of domain adaptation (DA) techniques have been proposed to address this problem. DA approaches are designed to adapt a fixed complexity model to the new (e.g., video) domain. We posit that unlabeled data should not only allow adaptation, but also improve (or at least maintain) performance on the original and other domains by dynamically adjusting model complexity and parameters. We call this notion domain expansion. To this end, we develop a new scalable and accurate incremental object detection algorithm, based on several extensions of large-margin embedding (LME). Our detection model consists of an embedding space and multiple class prototypes in that embedding space, that represent object classes; distance to those prototypes allows us to reason about multi-class detection. By incrementally detecting object instances in video and adding confident detections into the model, we are able to dynamically adjust the complexity of the detector over time by instantiating new prototypes to span all domains the model has seen. We test performance of our approach by expanding an object detector trained on ImageNet to detect objects in egocentric videos of Activity Daily Living (ADL) dataset and challenging videos from YouTube Objects (YTO) dataset.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"7 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120916588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jason Rock, Tanmay Gupta, J. Thorsen, JunYoung Gwak, Daeyun Shin, Derek Hoiem
{"title":"Completing 3D object shape from one depth image","authors":"Jason Rock, Tanmay Gupta, J. Thorsen, JunYoung Gwak, Daeyun Shin, Derek Hoiem","doi":"10.1109/CVPR.2015.7298863","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298863","url":null,"abstract":"Our goal is to recover a complete 3D model from a depth image of an object. Existing approaches rely on user interaction or apply to a limited class of objects, such as chairs. We aim to fully automatically reconstruct a 3D model from any category. We take an exemplar-based approach: retrieve similar objects in a database of 3D models using view-based matching and transfer the symmetries and surfaces from retrieved models. We investigate completion of 3D models in three cases: novel view (model in database); novel model (models for other objects of the same category in database); and novel category (no models from the category in database).","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121226437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wonmin Byeon, T. Breuel, Federico Raue, M. Liwicki
{"title":"Scene labeling with LSTM recurrent neural networks","authors":"Wonmin Byeon, T. Breuel, Federico Raue, M. Liwicki","doi":"10.1109/CVPR.2015.7298977","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298977","url":null,"abstract":"This paper addresses the problem of pixel-level segmentation and classification of scene images with an entirely learning-based approach using Long Short Term Memory (LSTM) recurrent neural networks, which are commonly used for sequence classification. We investigate two-dimensional (2D) LSTM networks for natural scene images taking into account the complex spatial dependencies of labels. Prior methods generally have required separate classification and image segmentation stages and/or pre- and post-processing. In our approach, classification, segmentation, and context integration are all carried out by 2D LSTM networks, allowing texture and spatial model parameters to be learned within a single model. The networks efficiently capture local and global contextual information over raw RGB values and adapt well for complex scene images. Our approach, which has a much lower computational complexity than prior methods, achieved state-of-the-art performance over the Stanford Background and the SIFT Flow datasets. In fact, if no pre- or post-processing is applied, LSTM networks outperform other state-of-the-art approaches. Hence, only with a single-core Central Processing Unit (CPU), the running time of our approach is equivalent or better than the compared state-of-the-art approaches which use a Graphics Processing Unit (GPU). Finally, our networks' ability to visualize feature maps from each layer supports the hypothesis that LSTM networks are overall suited for image processing tasks.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125844756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query-adaptive late fusion for image search and person re-identification","authors":"Liang Zheng, Shengjin Wang, Lu Tian, Fei He, Ziqiong Liu, Q. Tian","doi":"10.1109/CVPR.2015.7298783","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298783","url":null,"abstract":"Feature fusion has been proven effective [35, 36] in image search. Typically, it is assumed that the to-be-fused heterogeneous features work well by themselves for the query. However, in a more realistic situation, one does not know in advance whether a feature is effective or not for a given query. As a result, it is of great importance to identify feature effectiveness in a query-adaptive manner.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"516 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123566700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jared Heinly, Johannes L. Schönberger, Enrique Dunn, Jan-Michael Frahm
{"title":"Reconstructing the world* in six days","authors":"Jared Heinly, Johannes L. Schönberger, Enrique Dunn, Jan-Michael Frahm","doi":"10.1109/CVPR.2015.7298949","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298949","url":null,"abstract":"We propose a novel, large-scale, structure-from-motion framework that advances the state of the art in data scalability from city-scale modeling (millions of images) to world-scale modeling (several tens of millions of images) using just a single computer. The main enabling technology is the use of a streaming-based framework for connected component discovery. Moreover, our system employs an adaptive, online, iconic image clustering approach based on an augmented bag-of-words representation, in order to balance the goals of registration, comprehensiveness, and data compactness. We demonstrate our proposal by operating on a recent publicly available 100 million image crowd-sourced photo collection containing images geographically distributed throughout the entire world. Results illustrate that our streaming-based approach does not compromise model completeness, but achieves unprecedented levels of efficiency and scalability.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127001683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael Lam, J. Doppa, S. Todorovic, Thomas G. Dietterich
{"title":"ℋC-search for structured prediction in computer vision","authors":"Michael Lam, J. Doppa, S. Todorovic, Thomas G. Dietterich","doi":"10.1109/CVPR.2015.7299126","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7299126","url":null,"abstract":"The mainstream approach to structured prediction problems in computer vision is to learn an energy function such that the solution minimizes that function. At prediction time, this approach must solve an often-challenging optimization problem. Search-based methods provide an alternative that has the potential to achieve higher performance. These methods learn to control a search procedure that constructs and evaluates candidate solutions. The recently-developed ℋC-Search method has been shown to achieve state-of-the-art results in natural language processing, but mixed success when applied to vision problems. This paper studies whether ℋC-Search can achieve similarly competitive performance on basic vision tasks such as object detection, scene labeling, and monocular depth estimation, where the leading paradigm is energy minimization. To this end, we introduce a search operator suited to the vision domain that improves a candidate solution by probabilistically sampling likely object configurations in the scene from the hierarchical Berkeley segmentation. We complement this search operator by applying the DAgger algorithm to robustly train the search heuristic so it learns from its previous mistakes. Our evaluation shows that these improvements reduce the branching factor and search depth, and thus give a significant performance boost. Our state-of-the-art results on scene labeling and depth estimation suggest that ℋC-Search provides a suitable tool for learning and inference in vision.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127003645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Zhang, Wei Wei, Yanning Zhang, Chunna Tian, Fei Li
{"title":"Reweighted laplace prior based hyperspectral compressive sensing for unknown sparsity","authors":"Lei Zhang, Wei Wei, Yanning Zhang, Chunna Tian, Fei Li","doi":"10.1109/CVPR.2015.7298840","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298840","url":null,"abstract":"Compressive sensing(CS) has been exploited for hype-spectral image(HSI) compression in recent years. Though it can greatly reduce the costs of computation and storage, the reconstruction of HSI from a few linear measurements is challenging. The underlying sparsity of HSI is crucial to improve the reconstruction accuracy. However, the sparsity of HSI is unknown in reality and varied with different noise, which makes the sparsity estimation difficult. To address this problem, a novel reweighted Laplace prior based hyperspectral compressive sensing method is proposed in this study. First, the reweighted Laplace prior is proposed to model the distribution of sparsity in HSI. Second, the latent variable Bayes model is employed to learn the optimal configuration of the reweighted Laplace prior from the measurements. The model unifies signal recovery, prior learning and noise estimation into a variational framework to infer the parameters automatically. The learned sparsity prior can represent the underlying structure of the sparse signal very well and is adaptive to the unknown noise, which improves the reconstruction accuracy of HSI. The experimental results on three hyperspectral datasets demonstrate the proposed method outperforms several state-of-the-art hyperspectral CS methods on the reconstruction accuracy.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114960719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}