Chenxi Lei, Linfeng Jiang, Jingshen Ji, Weilin Zhong, Huilin Xiong
{"title":"Weakly Supervised Learning of Object-Part Attention Model for Fine-Grained Image Classification","authors":"Chenxi Lei, Linfeng Jiang, Jingshen Ji, Weilin Zhong, Huilin Xiong","doi":"10.1109/ICCT.2018.8600125","DOIUrl":null,"url":null,"abstract":"Fine-grained classification is challengeable due to the small inter-class variance and large intra-class distance between fine-grained categories. The key to solve this problem is to locate the discriminative part in the image. In this paper we propose a weakly supervised method, which only need image-level label for fine-grained classification. In our model, the convolutional neural network (CNN) can location the discriminative region through attention, and automatically focus on subtler features by zooming the discriminative region and feeding it to the next CNN. A Squeeze and Excitation (SE) module is employed for channel-wise attention, and a spatial constrain loss is utilized to keep the diversity of located part. We conduct experiments on CUB-2011-200, Stanford Dogs, and Stanford Cars datasets to evaluate the performance of our model. The experimental results demonstrate the effectiveness of the proposed method as compared other methods.","PeriodicalId":244952,"journal":{"name":"2018 IEEE 18th International Conference on Communication Technology (ICCT)","volume":"43 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 18th International Conference on Communication Technology (ICCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCT.2018.8600125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Fine-grained classification is challengeable due to the small inter-class variance and large intra-class distance between fine-grained categories. The key to solve this problem is to locate the discriminative part in the image. In this paper we propose a weakly supervised method, which only need image-level label for fine-grained classification. In our model, the convolutional neural network (CNN) can location the discriminative region through attention, and automatically focus on subtler features by zooming the discriminative region and feeding it to the next CNN. A Squeeze and Excitation (SE) module is employed for channel-wise attention, and a spatial constrain loss is utilized to keep the diversity of located part. We conduct experiments on CUB-2011-200, Stanford Dogs, and Stanford Cars datasets to evaluate the performance of our model. The experimental results demonstrate the effectiveness of the proposed method as compared other methods.