2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)最新文献

Hyper-class augmented and regularized deep learning for fine-grained image classification 用于细粒度图像分类的超类增强和正则化深度学习

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-10-16 DOI: 10.1109/CVPR.2015.7298880

Saining Xie, Tianbao Yang, Xiaoyu Wang, Yuanqing Lin

{"title":"Hyper-class augmented and regularized deep learning for fine-grained image classification","authors":"Saining Xie, Tianbao Yang, Xiaoyu Wang, Yuanqing Lin","doi":"10.1109/CVPR.2015.7298880","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298880","url":null,"abstract":"Deep convolutional neural networks (CNN) have seen tremendous success in large-scale generic object recognition. In comparison with generic object recognition, fine-grained image classification (FGIC) is much more challenging because (i) fine-grained labeled data is much more expensive to acquire (usually requiring domain expertise); (ii) there exists large intra-class and small inter-class variance. Most recent work exploiting deep CNN for image recognition with small training data adopts a simple strategy: pre-train a deep CNN on a large-scale external dataset (e.g., ImageNet) and fine-tune on the small-scale target data to fit the specific classification task. In this paper, beyond the fine-tuning strategy, we propose a systematic framework of learning a deep CNN that addresses the challenges from two new perspectives: (i) identifying easily annotated hyper-classes inherent in the fine-grained data and acquiring a large number of hyper-class-labeled images from readily available external sources (e.g., image search engines), and formulating the problem into multitask learning; (ii) a novel learning model by exploiting a regularization between the fine-grained recognition model and the hyper-class recognition model. We demonstrate the success of the proposed framework on two small-scale fine-grained datasets (Stanford Dogs and Stanford Cars) and on a large-scale car dataset that we collected.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"297 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122982063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 170

Efficient illuminant estimation for color constancy using grey pixels 基于灰度像素的彩色恒常性光源估计方法

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298838

Kai-Fu Yang, Shaobing Gao, Yongjie Li

引用次数: 115

Effective learning-based illuminant estimation using simple features 基于简单特征的有效学习光源估计

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298702

Dongliang Cheng, Brian L. Price, Scott D. Cohen, M. S. Brown

{"title":"Effective learning-based illuminant estimation using simple features","authors":"Dongliang Cheng, Brian L. Price, Scott D. Cohen, M. S. Brown","doi":"10.1109/CVPR.2015.7298702","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298702","url":null,"abstract":"Illumination estimation is the process of determining the chromaticity of the illumination in an imaged scene in order to remove undesirable color casts through white-balancing. While computational color constancy is a well-studied topic in computer vision, it remains challenging due to the ill-posed nature of the problem. One class of techniques relies on low-level statistical information in the image color distribution and works under various assumptions (e.g. Grey-World, White-Patch, etc). These methods have an advantage that they are simple and fast, but often do not perform well. More recent state-of-the-art methods employ learning-based techniques that produce better results, but often rely on complex features and have long evaluation and training times. In this paper, we present a learning-based method based on four simple color features and show how to use this with an ensemble of regression trees to estimate the illumination. We demonstrate that our approach is not only faster than existing learning-based methods in terms of both evaluation and training time, but also gives the best results reported to date on modern color constancy data sets.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117149968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 130

Expanding object detector's Horizon: Incremental learning framework for object detection in videos 扩展目标检测器的视野:视频中目标检测的增量学习框架

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298597

Alina Kuznetsova, Sung Ju Hwang, B. Rosenhahn, L. Sigal

{"title":"Expanding object detector's Horizon: Incremental learning framework for object detection in videos","authors":"Alina Kuznetsova, Sung Ju Hwang, B. Rosenhahn, L. Sigal","doi":"10.1109/CVPR.2015.7298597","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298597","url":null,"abstract":"Over the last several years it has been shown that image-based object detectors are sensitive to the training data and often fail to generalize to examples that fall outside the original training sample domain (e.g., videos). A number of domain adaptation (DA) techniques have been proposed to address this problem. DA approaches are designed to adapt a fixed complexity model to the new (e.g., video) domain. We posit that unlabeled data should not only allow adaptation, but also improve (or at least maintain) performance on the original and other domains by dynamically adjusting model complexity and parameters. We call this notion domain expansion. To this end, we develop a new scalable and accurate incremental object detection algorithm, based on several extensions of large-margin embedding (LME). Our detection model consists of an embedding space and multiple class prototypes in that embedding space, that represent object classes; distance to those prototypes allows us to reason about multi-class detection. By incrementally detecting object instances in video and adding confident detections into the model, we are able to dynamically adjust the complexity of the detector over time by instantiating new prototypes to span all domains the model has seen. We test performance of our approach by expanding an object detector trained on ImageNet to detect objects in egocentric videos of Activity Daily Living (ADL) dataset and challenging videos from YouTube Objects (YTO) dataset.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"7 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120916588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Completing 3D object shape from one depth image 从一张深度图像完成3D物体形状

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298863

Jason Rock, Tanmay Gupta, J. Thorsen, JunYoung Gwak, Daeyun Shin, Derek Hoiem

引用次数: 156

Scene labeling with LSTM recurrent neural networks 基于LSTM递归神经网络的场景标注

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298977

Wonmin Byeon, T. Breuel, Federico Raue, M. Liwicki

{"title":"Scene labeling with LSTM recurrent neural networks","authors":"Wonmin Byeon, T. Breuel, Federico Raue, M. Liwicki","doi":"10.1109/CVPR.2015.7298977","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298977","url":null,"abstract":"This paper addresses the problem of pixel-level segmentation and classification of scene images with an entirely learning-based approach using Long Short Term Memory (LSTM) recurrent neural networks, which are commonly used for sequence classification. We investigate two-dimensional (2D) LSTM networks for natural scene images taking into account the complex spatial dependencies of labels. Prior methods generally have required separate classification and image segmentation stages and/or pre- and post-processing. In our approach, classification, segmentation, and context integration are all carried out by 2D LSTM networks, allowing texture and spatial model parameters to be learned within a single model. The networks efficiently capture local and global contextual information over raw RGB values and adapt well for complex scene images. Our approach, which has a much lower computational complexity than prior methods, achieved state-of-the-art performance over the Stanford Background and the SIFT Flow datasets. In fact, if no pre- or post-processing is applied, LSTM networks outperform other state-of-the-art approaches. Hence, only with a single-core Central Processing Unit (CPU), the running time of our approach is equivalent or better than the compared state-of-the-art approaches which use a Graphics Processing Unit (GPU). Finally, our networks' ability to visualize feature maps from each layer supports the hypothesis that LSTM networks are overall suited for image processing tasks.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125844756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 345

Query-adaptive late fusion for image search and person re-identification 基于查询自适应的图像搜索与人物再识别后期融合

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298783

Liang Zheng, Shengjin Wang, Lu Tian, Fei He, Ziqiong Liu, Q. Tian

引用次数: 296

Reconstructing the world* in six days 在6天内重建世界

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298949

Jared Heinly, Johannes L. Schönberger, Enrique Dunn, Jan-Michael Frahm

引用次数: 112

ℋC-search for structured prediction in computer vision 计算机视觉中结构化预测的h - c搜索

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7299126

Michael Lam, J. Doppa, S. Todorovic, Thomas G. Dietterich

{"title":"ℋC-search for structured prediction in computer vision","authors":"Michael Lam, J. Doppa, S. Todorovic, Thomas G. Dietterich","doi":"10.1109/CVPR.2015.7299126","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7299126","url":null,"abstract":"The mainstream approach to structured prediction problems in computer vision is to learn an energy function such that the solution minimizes that function. At prediction time, this approach must solve an often-challenging optimization problem. Search-based methods provide an alternative that has the potential to achieve higher performance. These methods learn to control a search procedure that constructs and evaluates candidate solutions. The recently-developed ℋC-Search method has been shown to achieve state-of-the-art results in natural language processing, but mixed success when applied to vision problems. This paper studies whether ℋC-Search can achieve similarly competitive performance on basic vision tasks such as object detection, scene labeling, and monocular depth estimation, where the leading paradigm is energy minimization. To this end, we introduce a search operator suited to the vision domain that improves a candidate solution by probabilistically sampling likely object configurations in the scene from the hierarchical Berkeley segmentation. We complement this search operator by applying the DAgger algorithm to robustly train the search heuristic so it learns from its previous mistakes. Our evaluation shows that these improvements reduce the branching factor and search depth, and thus give a significant performance boost. Our state-of-the-art results on scene labeling and depth estimation suggest that ℋC-Search provides a suitable tool for learning and inference in vision.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127003645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Reweighted laplace prior based hyperspectral compressive sensing for unknown sparsity 未知稀疏度的重加权拉普拉斯先验高光谱压缩感知

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-06-07 DOI: 10.1109/CVPR.2015.7298840

Lei Zhang, Wei Wei, Yanning Zhang, Chunna Tian, Fei Li

{"title":"Reweighted laplace prior based hyperspectral compressive sensing for unknown sparsity","authors":"Lei Zhang, Wei Wei, Yanning Zhang, Chunna Tian, Fei Li","doi":"10.1109/CVPR.2015.7298840","DOIUrl":"https://doi.org/10.1109/CVPR.2015.7298840","url":null,"abstract":"Compressive sensing(CS) has been exploited for hype-spectral image(HSI) compression in recent years. Though it can greatly reduce the costs of computation and storage, the reconstruction of HSI from a few linear measurements is challenging. The underlying sparsity of HSI is crucial to improve the reconstruction accuracy. However, the sparsity of HSI is unknown in reality and varied with different noise, which makes the sparsity estimation difficult. To address this problem, a novel reweighted Laplace prior based hyperspectral compressive sensing method is proposed in this study. First, the reweighted Laplace prior is proposed to model the distribution of sparsity in HSI. Second, the latent variable Bayes model is employed to learn the optimal configuration of the reweighted Laplace prior from the measurements. The model unifies signal recovery, prior learning and noise estimation into a variational framework to infer the parameters automatically. The learned sparsity prior can represent the underlying structure of the sparse signal very well and is adaptive to the unknown noise, which improves the reconstruction accuracy of HSI. The experimental results on three hyperspectral datasets demonstrate the proposed method outperforms several state-of-the-art hyperspectral CS methods on the reconstruction accuracy.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114960719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38