Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2017-07-21 DOI:10.1109/CVPR.2017.476

Jianlong Fu, Heliang Zheng, Tao Mei

{"title":"Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition","authors":"Jianlong Fu, Heliang Zheng, Tao Mei","doi":"10.1109/CVPR.2017.476","DOIUrl":null,"url":null,"abstract":"Recognizing fine-grained categories (e.g., bird species) is difficult due to the challenges of discriminative region localization and fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that region detection and fine-grained feature learning are mutually correlated and thus can reinforce each other. In this paper, we propose a novel recurrent attention convolutional neural network (RA-CNN) which recursively learns discriminative region attention and region-based feature representation at multiple scales in a mutual reinforced way. The learning at each scale consists of a classification sub-network and an attention proposal sub-network (APN). The APN starts from full images, and iteratively generates region attention from coarse to fine by taking previous prediction as a reference, while the finer scale network takes as input an amplified attended region from previous scale in a recurrent way. The proposed RA-CNN is optimized by an intra-scale classification loss and an inter-scale ranking loss, to mutually learn accurate region attention and fine-grained representation. RA-CNN does not need bounding box/part annotations and can be trained end-to-end. We conduct comprehensive experiments and show that RA-CNN achieves the best performance in three fine-grained tasks, with relative accuracy gains of 3.3%, 3.7%, 3.8%, on CUB Birds, Stanford Dogs and Stanford Cars, respectively.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"4476-4484"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1037","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2017.476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1037

Abstract

Recognizing fine-grained categories (e.g., bird species) is difficult due to the challenges of discriminative region localization and fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that region detection and fine-grained feature learning are mutually correlated and thus can reinforce each other. In this paper, we propose a novel recurrent attention convolutional neural network (RA-CNN) which recursively learns discriminative region attention and region-based feature representation at multiple scales in a mutual reinforced way. The learning at each scale consists of a classification sub-network and an attention proposal sub-network (APN). The APN starts from full images, and iteratively generates region attention from coarse to fine by taking previous prediction as a reference, while the finer scale network takes as input an amplified attended region from previous scale in a recurrent way. The proposed RA-CNN is optimized by an intra-scale classification loss and an inter-scale ranking loss, to mutually learn accurate region attention and fine-grained representation. RA-CNN does not need bounding box/part annotations and can be trained end-to-end. We conduct comprehensive experiments and show that RA-CNN achieves the best performance in three fine-grained tasks, with relative accuracy gains of 3.3%, 3.7%, 3.8%, on CUB Birds, Stanford Dogs and Stanford Cars, respectively.

查看原文本刊更多论文

看得近看得好:用于细粒度图像识别的循环注意卷积神经网络

由于判别区域定位和细粒度特征学习的挑战，识别细粒度类别(如鸟类)是困难的。现有的方法主要是独立解决这些挑战，而忽略了区域检测和细粒度特征学习是相互关联的，因此可以相互加强。本文提出了一种新的递归注意卷积神经网络(RA-CNN)，该网络在多尺度上以相互强化的方式递归学习判别区域注意和基于区域的特征表示。每个尺度上的学习由分类子网络和注意建议子网络(APN)组成。APN从完整图像开始，以之前的预测为参考，由粗到细迭代生成区域关注，而更精细的尺度网络以循环的方式从之前的尺度中放大一个被关注的区域作为输入。本文提出的RA-CNN通过尺度内分类损失和尺度间排序损失进行优化，以相互学习准确的区域关注和细粒度表示。RA-CNN不需要边界框/部分注释，可以端到端进行训练。我们进行了全面的实验，结果表明，RA-CNN在三个细粒度任务中取得了最好的性能，在CUB Birds、Stanford Dogs和Stanford Cars上的相对准确率分别提高了3.3%、3.7%和3.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量