Hyper-class augmented and regularized deep learning for fine-grained image classification

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2015-10-16 DOI:10.1109/CVPR.2015.7298880

Saining Xie, Tianbao Yang, Xiaoyu Wang, Yuanqing Lin

{"title":"Hyper-class augmented and regularized deep learning for fine-grained image classification","authors":"Saining Xie, Tianbao Yang, Xiaoyu Wang, Yuanqing Lin","doi":"10.1109/CVPR.2015.7298880","DOIUrl":null,"url":null,"abstract":"Deep convolutional neural networks (CNN) have seen tremendous success in large-scale generic object recognition. In comparison with generic object recognition, fine-grained image classification (FGIC) is much more challenging because (i) fine-grained labeled data is much more expensive to acquire (usually requiring domain expertise); (ii) there exists large intra-class and small inter-class variance. Most recent work exploiting deep CNN for image recognition with small training data adopts a simple strategy: pre-train a deep CNN on a large-scale external dataset (e.g., ImageNet) and fine-tune on the small-scale target data to fit the specific classification task. In this paper, beyond the fine-tuning strategy, we propose a systematic framework of learning a deep CNN that addresses the challenges from two new perspectives: (i) identifying easily annotated hyper-classes inherent in the fine-grained data and acquiring a large number of hyper-class-labeled images from readily available external sources (e.g., image search engines), and formulating the problem into multitask learning; (ii) a novel learning model by exploiting a regularization between the fine-grained recognition model and the hyper-class recognition model. We demonstrate the success of the proposed framework on two small-scale fine-grained datasets (Stanford Dogs and Stanford Cars) and on a large-scale car dataset that we collected.","PeriodicalId":444472,"journal":{"name":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"297 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"170","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2015.7298880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 170

Abstract

Deep convolutional neural networks (CNN) have seen tremendous success in large-scale generic object recognition. In comparison with generic object recognition, fine-grained image classification (FGIC) is much more challenging because (i) fine-grained labeled data is much more expensive to acquire (usually requiring domain expertise); (ii) there exists large intra-class and small inter-class variance. Most recent work exploiting deep CNN for image recognition with small training data adopts a simple strategy: pre-train a deep CNN on a large-scale external dataset (e.g., ImageNet) and fine-tune on the small-scale target data to fit the specific classification task. In this paper, beyond the fine-tuning strategy, we propose a systematic framework of learning a deep CNN that addresses the challenges from two new perspectives: (i) identifying easily annotated hyper-classes inherent in the fine-grained data and acquiring a large number of hyper-class-labeled images from readily available external sources (e.g., image search engines), and formulating the problem into multitask learning; (ii) a novel learning model by exploiting a regularization between the fine-grained recognition model and the hyper-class recognition model. We demonstrate the success of the proposed framework on two small-scale fine-grained datasets (Stanford Dogs and Stanford Cars) and on a large-scale car dataset that we collected.

查看原文本刊更多论文

用于细粒度图像分类的超类增强和正则化深度学习

深度卷积神经网络(CNN)在大规模通用对象识别方面取得了巨大成功。与通用对象识别相比，细粒度图像分类(FGIC)更具挑战性，因为:(i)细粒度标记数据的获取成本要高得多(通常需要领域专业知识);(ii)类内差异较大，类间差异较小。最近利用深度CNN进行小型训练数据图像识别的工作采用了一种简单的策略:在大规模外部数据集(例如ImageNet)上预训练深度CNN，并对小规模目标数据进行微调以适应特定的分类任务。在本文中，除了微调策略之外，我们提出了一个学习深度CNN的系统框架，该框架从两个新的角度解决了挑战:(i)识别细粒度数据中固有的易于注释的超类，并从现成的外部来源(例如图像搜索引擎)获取大量超类标记的图像，并将问题形成多任务学习;(ii)利用细粒度识别模型和超类识别模型之间的正则化，建立一种新的学习模型。我们在两个小规模的细粒度数据集(Stanford Dogs和Stanford Cars)和我们收集的大规模汽车数据集上证明了所提出的框架的成功。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量