{"title":"面向细粒度视觉分类的GRAD-CAM引导通道-空间注意模块","authors":"Shuai Xu, Dongliang Chang, Jiyang Xie, Zhanyu Ma","doi":"10.1109/mlsp52302.2021.9596481","DOIUrl":null,"url":null,"abstract":"Fine-grained visual classification (FGVC) is becoming an important research field, due to its wide applications and the rapid development of computer vision technologies. The current state-of-the-art (SOTA) methods in the FGVC usually employ attention mechanisms to first capture the semantic parts and then discover their subtle differences between distinct classes. The existing attention modules have significantly improved the classification performance but they are poorly guided since part-based detectors in the FGVC depend on the network learning ability without the supervision of part annotations. As obtaining such part annotations is labor-expensive, some visual localization and explanation methods, such as gradient-weighted class activation mapping (Grad-CAM), can be utilized for supervising the attention mechanism. In this paper, we propose a Grad-CAM guided channel-spatial attention module for the FGVC, which employs the Grad-CAM to supervise and constrain the attention weights by generating the coarse localization maps. To demonstrate the effectiveness of the proposed method, we conduct comprehensive experiments on three popular FGVC datasets, including CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets. The proposed method outperforms the SOTA attention modules in the FGVC task. In addition, visualizations of the feature maps demonstrate the superiority of the proposed method against the SOTA approaches.","PeriodicalId":156116,"journal":{"name":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"GRAD-CAM Guided Channel-Spatial Attention Module for Fine-Grained Visual Classification\",\"authors\":\"Shuai Xu, Dongliang Chang, Jiyang Xie, Zhanyu Ma\",\"doi\":\"10.1109/mlsp52302.2021.9596481\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fine-grained visual classification (FGVC) is becoming an important research field, due to its wide applications and the rapid development of computer vision technologies. The current state-of-the-art (SOTA) methods in the FGVC usually employ attention mechanisms to first capture the semantic parts and then discover their subtle differences between distinct classes. The existing attention modules have significantly improved the classification performance but they are poorly guided since part-based detectors in the FGVC depend on the network learning ability without the supervision of part annotations. As obtaining such part annotations is labor-expensive, some visual localization and explanation methods, such as gradient-weighted class activation mapping (Grad-CAM), can be utilized for supervising the attention mechanism. In this paper, we propose a Grad-CAM guided channel-spatial attention module for the FGVC, which employs the Grad-CAM to supervise and constrain the attention weights by generating the coarse localization maps. To demonstrate the effectiveness of the proposed method, we conduct comprehensive experiments on three popular FGVC datasets, including CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets. The proposed method outperforms the SOTA attention modules in the FGVC task. In addition, visualizations of the feature maps demonstrate the superiority of the proposed method against the SOTA approaches.\",\"PeriodicalId\":156116,\"journal\":{\"name\":\"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/mlsp52302.2021.9596481\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mlsp52302.2021.9596481","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
摘要
细粒度视觉分类(FGVC)由于其广泛的应用和计算机视觉技术的快速发展而成为一个重要的研究领域。目前FGVC中最先进的SOTA方法通常采用注意机制首先捕获语义部分,然后发现不同类别之间的细微差异。现有的注意模块已经显著提高了分类性能,但由于FGVC中基于部件的检测器依赖于网络学习能力而没有部件注释的监督,因此它们的导向性较差。由于获得这些部分注释是一项耗费人力的工作,因此可以使用一些视觉定位和解释方法,如梯度加权类激活映射(gradient-weighted class activation mapping, Grad-CAM)来监督注意机制。在本文中,我们提出了一个用于FGVC的Grad-CAM引导通道空间注意模块,该模块利用Grad-CAM通过生成粗定位图来监督和约束注意权值。为了证明该方法的有效性,我们在三个流行的FGVC数据集上进行了全面的实验,包括ub -200-2011、斯坦福汽车和FGVC- aircraft数据集。该方法在FGVC任务中优于SOTA注意模块。此外,特征映射的可视化显示了该方法相对于SOTA方法的优越性。
GRAD-CAM Guided Channel-Spatial Attention Module for Fine-Grained Visual Classification
Fine-grained visual classification (FGVC) is becoming an important research field, due to its wide applications and the rapid development of computer vision technologies. The current state-of-the-art (SOTA) methods in the FGVC usually employ attention mechanisms to first capture the semantic parts and then discover their subtle differences between distinct classes. The existing attention modules have significantly improved the classification performance but they are poorly guided since part-based detectors in the FGVC depend on the network learning ability without the supervision of part annotations. As obtaining such part annotations is labor-expensive, some visual localization and explanation methods, such as gradient-weighted class activation mapping (Grad-CAM), can be utilized for supervising the attention mechanism. In this paper, we propose a Grad-CAM guided channel-spatial attention module for the FGVC, which employs the Grad-CAM to supervise and constrain the attention weights by generating the coarse localization maps. To demonstrate the effectiveness of the proposed method, we conduct comprehensive experiments on three popular FGVC datasets, including CUB-200-2011, Stanford Cars, and FGVC-Aircraft datasets. The proposed method outperforms the SOTA attention modules in the FGVC task. In addition, visualizations of the feature maps demonstrate the superiority of the proposed method against the SOTA approaches.