Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification

ACM Multimedia Asia Pub Date : 2021-06-21 DOI:10.1145/3469877.3490579

Chenyu Guo, Jiyang Xie, K. Liang, Xian Sun, Zhanyu Ma

{"title":"Cross-layer Navigation Convolutional Neural Network for Fine-grained Visual Classification","authors":"Chenyu Guo, Jiyang Xie, K. Liang, Xian Sun, Zhanyu Ma","doi":"10.1145/3469877.3490579","DOIUrl":null,"url":null,"abstract":"Fine-grained visual classification (FGVC) aims to classify sub-classes of objects in the same super-class (e.g., species of birds, models of cars). For the FGVC tasks, the essential solution is to find discriminative subtle information of the target from local regions. Traditional FGVC models preferred to use the refined features, i.e., high-level semantic information for recognition and rarely use low-level information. However, it turns out that low-level information which contains rich detail information also has effect on improving performance. Therefore, in this paper, we propose cross-layer navigation convolutional neural network for feature fusion. First, the feature maps extracted by the backbone network are fed into a convolutional long short-term memory model sequentially from high-level to low-level to perform feature aggregation. Then, attention mechanisms are used after feature fusion to extract spatial and channel information while linking the high-level semantic information and the low-level texture features, which can better locate the discriminative regions for the FGVC. In the experiments, three commonly used FGVC datasets, including CUB-200-2011, Stanford-Cars, and FGVC-Aircraft datasets, are used for evaluation and we demonstrate the superiority of the proposed method by comparing it with other referred FGVC methods to show that this method achieves superior results. https://github.com/PRIS-CV/CN-CNN.git","PeriodicalId":210974,"journal":{"name":"ACM Multimedia Asia","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3469877.3490579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Fine-grained visual classification (FGVC) aims to classify sub-classes of objects in the same super-class (e.g., species of birds, models of cars). For the FGVC tasks, the essential solution is to find discriminative subtle information of the target from local regions. Traditional FGVC models preferred to use the refined features, i.e., high-level semantic information for recognition and rarely use low-level information. However, it turns out that low-level information which contains rich detail information also has effect on improving performance. Therefore, in this paper, we propose cross-layer navigation convolutional neural network for feature fusion. First, the feature maps extracted by the backbone network are fed into a convolutional long short-term memory model sequentially from high-level to low-level to perform feature aggregation. Then, attention mechanisms are used after feature fusion to extract spatial and channel information while linking the high-level semantic information and the low-level texture features, which can better locate the discriminative regions for the FGVC. In the experiments, three commonly used FGVC datasets, including CUB-200-2011, Stanford-Cars, and FGVC-Aircraft datasets, are used for evaluation and we demonstrate the superiority of the proposed method by comparing it with other referred FGVC methods to show that this method achieves superior results. https://github.com/PRIS-CV/CN-CNN.git

查看原文本刊更多论文

面向细粒度视觉分类的跨层导航卷积神经网络

细粒度视觉分类(FGVC)旨在对同一超类中的对象的子类进行分类(例如，鸟类物种，汽车模型)。对于FGVC任务来说，关键的解决方案是从局部区域中找到目标的判别性细微信息。传统的FGVC模型倾向于使用精细化的特征，即高级语义信息进行识别，很少使用低级信息。然而，事实证明，包含丰富细节信息的低级信息也对提高性能有影响。因此，本文提出了跨层导航卷积神经网络进行特征融合。首先，将骨干网提取的特征映射由高到低依次输入到卷积长短期记忆模型中进行特征聚合;然后，在特征融合后利用注意机制提取空间信息和通道信息，将高层语义信息与低层纹理特征联系起来，更好地定位FGVC的判别区域。在实验中，我们使用了三种常用的FGVC数据集，包括CUB-200-2011、Stanford-Cars和FGVC- aircraft数据集进行了评估，通过与其他参考FGVC方法的比较，我们证明了该方法的优越性，表明该方法取得了更好的结果。https://github.com/PRIS-CV/CN-CNN.git

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Multimedia Asia

自引率

0.00%

发文量