基于掩模的超细粒度视觉分类特征提取与增强

2021 Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2021-09-16 DOI:10.1109/DICTA52665.2021.9647389

Zicheng Pan, Xiaohan Yu, Miaohua Zhang, Yongsheng Gao

{"title":"基于掩模的超细粒度视觉分类特征提取与增强","authors":"Zicheng Pan, Xiaohan Yu, Miaohua Zhang, Yongsheng Gao","doi":"10.1109/DICTA52665.2021.9647389","DOIUrl":null,"url":null,"abstract":"While the fine-grained visual categorization (FGVC) problems have been greatly developed in the past years, the Ultra-fine-grained visual categorization (Ultra-FGVC) problems have been understudied. FGVC aims at classifying objects from the same species (very similar categories), while the Ultra-FGVC targets at more challenging problems of classifying images at an ultra-fine granularity where even human experts may fail to identify the visual difference. The challenges for Ultra-FGVC mainly come from two aspects: one is that the Ultra-FGVC often arises overfitting problems due to the lack of training samples; and another lies in that the inter-class variance among images is much smaller than normal FGVC tasks, which makes it difficult to learn discriminative features for each class. To solve these challenges, a mask-guided feature extraction and feature augmentation method is proposed in this paper to extract discriminative and informative regions of images which are then used to augment the original feature map. The advantage of the proposed method is that the feature detection and extraction model only requires a small amount of target region samples with bounding boxes for training, then it can automatically locate the target area for a large number of images in the dataset at a high detection accuracy. Experimental results on two public datasets and ten state-of-the-art benchmark methods consistently demonstrate the effectiveness of the proposed method both visually and quantitatively.","PeriodicalId":424950,"journal":{"name":"2021 Digital Image Computing: Techniques and Applications (DICTA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Mask-Guided Feature Extraction and Augmentation for Ultra-Fine-Grained Visual Categorization\",\"authors\":\"Zicheng Pan, Xiaohan Yu, Miaohua Zhang, Yongsheng Gao\",\"doi\":\"10.1109/DICTA52665.2021.9647389\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While the fine-grained visual categorization (FGVC) problems have been greatly developed in the past years, the Ultra-fine-grained visual categorization (Ultra-FGVC) problems have been understudied. FGVC aims at classifying objects from the same species (very similar categories), while the Ultra-FGVC targets at more challenging problems of classifying images at an ultra-fine granularity where even human experts may fail to identify the visual difference. The challenges for Ultra-FGVC mainly come from two aspects: one is that the Ultra-FGVC often arises overfitting problems due to the lack of training samples; and another lies in that the inter-class variance among images is much smaller than normal FGVC tasks, which makes it difficult to learn discriminative features for each class. To solve these challenges, a mask-guided feature extraction and feature augmentation method is proposed in this paper to extract discriminative and informative regions of images which are then used to augment the original feature map. The advantage of the proposed method is that the feature detection and extraction model only requires a small amount of target region samples with bounding boxes for training, then it can automatically locate the target area for a large number of images in the dataset at a high detection accuracy. Experimental results on two public datasets and ten state-of-the-art benchmark methods consistently demonstrate the effectiveness of the proposed method both visually and quantitatively.\",\"PeriodicalId\":424950,\"journal\":{\"name\":\"2021 Digital Image Computing: Techniques and Applications (DICTA)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Digital Image Computing: Techniques and Applications (DICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DICTA52665.2021.9647389\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA52665.2021.9647389","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

虽然细粒度视觉分类(FGVC)问题在过去的几年里得到了很大的发展，但超细粒度视觉分类(ultra -细粒度视觉分类)问题的研究还不够。FGVC的目标是对同一物种(非常相似的类别)的物体进行分类，而Ultra-FGVC的目标是在超细粒度下对图像进行分类的更具挑战性的问题，即使是人类专家也可能无法识别视觉差异。Ultra-FGVC面临的挑战主要来自两个方面:一是由于缺乏训练样本，Ultra-FGVC经常出现过拟合问题;二是图像间的类间方差远小于正常FGVC任务，难以学习到每个类的判别特征。为了解决这些问题，本文提出了一种掩模引导的特征提取和特征增强方法，提取图像的判别性和信息性区域，然后将其用于增强原始特征映射。该方法的优点是，特征检测与提取模型只需要少量的带边界框的目标区域样本进行训练，就可以对数据集中大量图像自动定位目标区域，检测精度较高。在两个公共数据集和十种最先进的基准方法上的实验结果一致地证明了该方法在视觉和定量上的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mask-Guided Feature Extraction and Augmentation for Ultra-Fine-Grained Visual Categorization

While the fine-grained visual categorization (FGVC) problems have been greatly developed in the past years, the Ultra-fine-grained visual categorization (Ultra-FGVC) problems have been understudied. FGVC aims at classifying objects from the same species (very similar categories), while the Ultra-FGVC targets at more challenging problems of classifying images at an ultra-fine granularity where even human experts may fail to identify the visual difference. The challenges for Ultra-FGVC mainly come from two aspects: one is that the Ultra-FGVC often arises overfitting problems due to the lack of training samples; and another lies in that the inter-class variance among images is much smaller than normal FGVC tasks, which makes it difficult to learn discriminative features for each class. To solve these challenges, a mask-guided feature extraction and feature augmentation method is proposed in this paper to extract discriminative and informative regions of images which are then used to augment the original feature map. The advantage of the proposed method is that the feature detection and extraction model only requires a small amount of target region samples with bounding boxes for training, then it can automatically locate the target area for a large number of images in the dataset at a high detection accuracy. Experimental results on two public datasets and ten state-of-the-art benchmark methods consistently demonstrate the effectiveness of the proposed method both visually and quantitatively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 Digital Image Computing: Techniques and Applications (DICTA)

自引率

0.00%

发文量