{"title":"Spatially-Excited Attention Learning for Fine-Grained Visual Categorization","authors":"Zhaozhi Luo, Min-Hsiang Hung, Yi-Wen Lu, Kuan-Wen Chen","doi":"10.1109/ARIS56205.2022.9910447","DOIUrl":null,"url":null,"abstract":"Learning distinguishable feature embedding plays an important role in fine-grained visual categorization. The existing methods focus on either designing a complex attention mechanism to boost the overall classification performance or proposing a specific training strategy to enhance the learning of the backbone network to achieve a low-cost backbone-only inference. Unlike all of them, an alternative approach called Spatially-Excited Attention Learning (SEAL) is proposed in this paper. The training of SEAL is similar to that of most of the existing methods, but it provides two alternative streams during a network inference: one stream requires higher effort but provides higher performance; the other is a low-cost backbone-only inference with lower but still comparative performance. Note that both the streams are trained at the same time by SEAL. The experiments show that SEAL achieves the state-of-the-art performance under both complex architecture and backbone-only inference conditions.","PeriodicalId":254572,"journal":{"name":"2022 International Conference on Advanced Robotics and Intelligent Systems (ARIS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Advanced Robotics and Intelligent Systems (ARIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARIS56205.2022.9910447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Learning distinguishable feature embedding plays an important role in fine-grained visual categorization. The existing methods focus on either designing a complex attention mechanism to boost the overall classification performance or proposing a specific training strategy to enhance the learning of the backbone network to achieve a low-cost backbone-only inference. Unlike all of them, an alternative approach called Spatially-Excited Attention Learning (SEAL) is proposed in this paper. The training of SEAL is similar to that of most of the existing methods, but it provides two alternative streams during a network inference: one stream requires higher effort but provides higher performance; the other is a low-cost backbone-only inference with lower but still comparative performance. Note that both the streams are trained at the same time by SEAL. The experiments show that SEAL achieves the state-of-the-art performance under both complex architecture and backbone-only inference conditions.