Multi-branch selection fusion fine-grained classification algorithm based on coordinate attention localization

IF 1.4 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

AI Communications Pub Date : 2023-08-21 DOI:10.3233/aic-220187

Feng Zhang, Gaocai Wang, Man Wu, Shuqiang Huang

{"title":"Multi-branch selection fusion fine-grained classification algorithm based on coordinate attention localization","authors":"Feng Zhang, Gaocai Wang, Man Wu, Shuqiang Huang","doi":"10.3233/aic-220187","DOIUrl":null,"url":null,"abstract":"Object localization has been the focus of research in Fine-Grained Visual Categorization (FGVC). With the aim of improving the accuracy and precision of object localization in multi-branch networks, as well as the robustness and universality of object localization methods, our study mainly focus on how to combine coordinate attention and feature activation map for target localization. The model in this paper is a three-branch model including raw branch, object branch and part branch. The images are fed directly into the raw branch. Coordinate Attention Object Localization Module (CAOLM) is used to localize and crop objects in the image to generate the input for the object branch. Attention Partial Proposal Module (APPM) is used to propose part regions at different scales. The three classes of input images undergo end-to-end weakly supervised learning through different branches of the network. The model expands the receptive field to capture multi-scale features by Selective Branch Atrous Spatial Pooling Pyramid (SB-ASPP). It can fuse the feature maps obtained from the raw branch and the object branch with Selective Branch Block (SBBlock), and the complete features of the raw branch are used to supplement the missing information of the object branch. Extensive experimental results on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets show that our method has the best classification performance on FGVC-Aircraft and also has competitive performance on other datasets. Few parameters and fast inference speed are also the advantages of our model.","PeriodicalId":50835,"journal":{"name":"AI Communications","volume":"67 1","pages":"0"},"PeriodicalIF":1.4000,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/aic-220187","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Object localization has been the focus of research in Fine-Grained Visual Categorization (FGVC). With the aim of improving the accuracy and precision of object localization in multi-branch networks, as well as the robustness and universality of object localization methods, our study mainly focus on how to combine coordinate attention and feature activation map for target localization. The model in this paper is a three-branch model including raw branch, object branch and part branch. The images are fed directly into the raw branch. Coordinate Attention Object Localization Module (CAOLM) is used to localize and crop objects in the image to generate the input for the object branch. Attention Partial Proposal Module (APPM) is used to propose part regions at different scales. The three classes of input images undergo end-to-end weakly supervised learning through different branches of the network. The model expands the receptive field to capture multi-scale features by Selective Branch Atrous Spatial Pooling Pyramid (SB-ASPP). It can fuse the feature maps obtained from the raw branch and the object branch with Selective Branch Block (SBBlock), and the complete features of the raw branch are used to supplement the missing information of the object branch. Extensive experimental results on CUB-200-2011, FGVC-Aircraft and Stanford Cars datasets show that our method has the best classification performance on FGVC-Aircraft and also has competitive performance on other datasets. Few parameters and fast inference speed are also the advantages of our model.

查看原文本刊更多论文

基于坐标注意定位的多分支选择融合细粒度分类算法

目标定位一直是细粒度视觉分类(FGVC)研究的热点。为了提高多分支网络中目标定位的准确性和精度，以及目标定位方法的鲁棒性和通用性，我们主要研究如何将坐标注意和特征激活图结合起来进行目标定位。本文的模型是一个三分支模型，包括原始分支、对象分支和零件分支。图像直接输入到原始分支中。坐标注意对象定位模块(Coordinate Attention Object Localization Module, CAOLM)用于对图像中的对象进行定位和裁剪，生成对象分支的输入。注意局部建议模块(Attention Partial Proposal Module, APPM)用于提出不同尺度的局部区域。这三类输入图像通过网络的不同分支进行端到端的弱监督学习。该模型通过选择分支分布空间池金字塔(SB-ASPP)扩展接受野以捕获多尺度特征。它可以将原始分支和目标分支得到的特征映射与选择性分支块(sblock)融合，用原始分支的完整特征来补充目标分支的缺失信息。在ub -200-2011、FGVC-Aircraft和Stanford Cars数据集上的大量实验结果表明，我们的方法在FGVC-Aircraft数据集上具有最佳的分类性能，并且在其他数据集上也具有竞争力。该模型的优点是参数少，推理速度快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

AI Communications 工程技术-计算机：人工智能

CiteScore

2.30

自引率

12.50%

发文量

审稿时长

4.5 months

期刊介绍： AI Communications is a journal on artificial intelligence (AI) which has a close relationship to EurAI (European Association for Artificial Intelligence, formerly ECCAI). It covers the whole AI community: Scientific institutions as well as commercial and industrial companies. AI Communications aims to enhance contacts and information exchange between AI researchers and developers, and to provide supranational information to those concerned with AI and advanced information processing. AI Communications publishes refereed articles concerning scientific and technical AI procedures, provided they are of sufficient interest to a large readership of both scientific and practical background. In addition it contains high-level background material, both at the technical level as well as the level of opinions, policies and news.