Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition

Jianlong Fu, Heliang Zheng, Tao Mei
{"title":"Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition","authors":"Jianlong Fu, Heliang Zheng, Tao Mei","doi":"10.1109/CVPR.2017.476","DOIUrl":null,"url":null,"abstract":"Recognizing fine-grained categories (e.g., bird species) is difficult due to the challenges of discriminative region localization and fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that region detection and fine-grained feature learning are mutually correlated and thus can reinforce each other. In this paper, we propose a novel recurrent attention convolutional neural network (RA-CNN) which recursively learns discriminative region attention and region-based feature representation at multiple scales in a mutual reinforced way. The learning at each scale consists of a classification sub-network and an attention proposal sub-network (APN). The APN starts from full images, and iteratively generates region attention from coarse to fine by taking previous prediction as a reference, while the finer scale network takes as input an amplified attended region from previous scale in a recurrent way. The proposed RA-CNN is optimized by an intra-scale classification loss and an inter-scale ranking loss, to mutually learn accurate region attention and fine-grained representation. RA-CNN does not need bounding box/part annotations and can be trained end-to-end. We conduct comprehensive experiments and show that RA-CNN achieves the best performance in three fine-grained tasks, with relative accuracy gains of 3.3%, 3.7%, 3.8%, on CUB Birds, Stanford Dogs and Stanford Cars, respectively.","PeriodicalId":6631,"journal":{"name":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"15 1","pages":"4476-4484"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1037","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2017.476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1037

Abstract

Recognizing fine-grained categories (e.g., bird species) is difficult due to the challenges of discriminative region localization and fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that region detection and fine-grained feature learning are mutually correlated and thus can reinforce each other. In this paper, we propose a novel recurrent attention convolutional neural network (RA-CNN) which recursively learns discriminative region attention and region-based feature representation at multiple scales in a mutual reinforced way. The learning at each scale consists of a classification sub-network and an attention proposal sub-network (APN). The APN starts from full images, and iteratively generates region attention from coarse to fine by taking previous prediction as a reference, while the finer scale network takes as input an amplified attended region from previous scale in a recurrent way. The proposed RA-CNN is optimized by an intra-scale classification loss and an inter-scale ranking loss, to mutually learn accurate region attention and fine-grained representation. RA-CNN does not need bounding box/part annotations and can be trained end-to-end. We conduct comprehensive experiments and show that RA-CNN achieves the best performance in three fine-grained tasks, with relative accuracy gains of 3.3%, 3.7%, 3.8%, on CUB Birds, Stanford Dogs and Stanford Cars, respectively.
看得近看得好:用于细粒度图像识别的循环注意卷积神经网络
由于判别区域定位和细粒度特征学习的挑战,识别细粒度类别(如鸟类)是困难的。现有的方法主要是独立解决这些挑战,而忽略了区域检测和细粒度特征学习是相互关联的,因此可以相互加强。本文提出了一种新的递归注意卷积神经网络(RA-CNN),该网络在多尺度上以相互强化的方式递归学习判别区域注意和基于区域的特征表示。每个尺度上的学习由分类子网络和注意建议子网络(APN)组成。APN从完整图像开始,以之前的预测为参考,由粗到细迭代生成区域关注,而更精细的尺度网络以循环的方式从之前的尺度中放大一个被关注的区域作为输入。本文提出的RA-CNN通过尺度内分类损失和尺度间排序损失进行优化,以相互学习准确的区域关注和细粒度表示。RA-CNN不需要边界框/部分注释,可以端到端进行训练。我们进行了全面的实验,结果表明,RA-CNN在三个细粒度任务中取得了最好的性能,在CUB Birds、Stanford Dogs和Stanford Cars上的相对准确率分别提高了3.3%、3.7%和3.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信