综合处理噪声和细微差异的网络监督细粒度分类

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-25 DOI:10.1109/TIP.2025.3562740

Junjie Chen;Jiebin Yan;Yuming Fang;Li Niu

{"title":"综合处理噪声和细微差异的网络监督细粒度分类","authors":"Junjie Chen;Jiebin Yan;Yuming Fang;Li Niu","doi":"10.1109/TIP.2025.3562740","DOIUrl":null,"url":null,"abstract":"Webly-supervised fine-grained visual classification (WSL-FGVC) aims to learn similar sub-classes from cheap web images, which suffers from two major issues: label noises in web images and subtle differences among fine-grained classes. However, existing methods for WSL-FGVC only focus on suppressing noise at image-level, but neglect to mine cues at pixel-level to distinguish the subtle differences among fine-grained classes. In this paper, we propose a bag-level top-down attention framework, which could tackle label noises and mine subtle cues simultaneously and integrally. Specifically, our method first extracts high-level semantic information from a bag of images belonging to the same class, and then uses the bag-level information to mine discriminative regions in various scales of each image. Besides, we propose to derive attention weights from attention maps to weight the bag-level fusion for a robust supervision. We also propose an attention loss on self-bag attention and cross-bag attention to facilitate the learning of valid attention. Extensive experiments on four WSL-FGVC datasets, i.e., Web-Aircraft, Web-Bird, Web-Car, and WebiNat-5089, demonstrate the effectiveness of our method against the state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2641-2653"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Webly Supervised Fine-Grained Classification by Integrally Tackling Noises and Subtle Differences\",\"authors\":\"Junjie Chen;Jiebin Yan;Yuming Fang;Li Niu\",\"doi\":\"10.1109/TIP.2025.3562740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Webly-supervised fine-grained visual classification (WSL-FGVC) aims to learn similar sub-classes from cheap web images, which suffers from two major issues: label noises in web images and subtle differences among fine-grained classes. However, existing methods for WSL-FGVC only focus on suppressing noise at image-level, but neglect to mine cues at pixel-level to distinguish the subtle differences among fine-grained classes. In this paper, we propose a bag-level top-down attention framework, which could tackle label noises and mine subtle cues simultaneously and integrally. Specifically, our method first extracts high-level semantic information from a bag of images belonging to the same class, and then uses the bag-level information to mine discriminative regions in various scales of each image. Besides, we propose to derive attention weights from attention maps to weight the bag-level fusion for a robust supervision. We also propose an attention loss on self-bag attention and cross-bag attention to facilitate the learning of valid attention. Extensive experiments on four WSL-FGVC datasets, i.e., Web-Aircraft, Web-Bird, Web-Car, and WebiNat-5089, demonstrate the effectiveness of our method against the state-of-the-art methods.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"2641-2653\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10977734/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10977734/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

web监督细粒度视觉分类（web -supervised fine-grained visual classification, WSL-FGVC）旨在从廉价的web图像中学习相似的子类，该方法存在两个主要问题：web图像中的标签噪声和细粒度类之间的细微差异。然而，现有的WSL-FGVC方法只关注图像级的噪声抑制，而忽略了在像素级挖掘线索，以区分细粒度类之间的细微差异。在本文中，我们提出了一个袋级自上而下的注意框架，该框架可以同时完整地处理标签噪声和挖掘微妙线索。具体来说，我们的方法首先从属于同一类别的图像中提取高级语义信息，然后使用袋级信息挖掘每个图像的不同尺度的判别区域。此外，我们提出从注意图中导出注意权值来对袋级融合进行加权，以实现鲁棒监督。我们还提出了自袋注意和跨袋注意的注意缺失，以促进有效注意的学习。在Web-Aircraft、Web-Bird、Web-Car和WebiNat-5089四个WSL-FGVC数据集上进行的大量实验表明，我们的方法与最先进的方法相比是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Webly Supervised Fine-Grained Classification by Integrally Tackling Noises and Subtle Differences

Webly-supervised fine-grained visual classification (WSL-FGVC) aims to learn similar sub-classes from cheap web images, which suffers from two major issues: label noises in web images and subtle differences among fine-grained classes. However, existing methods for WSL-FGVC only focus on suppressing noise at image-level, but neglect to mine cues at pixel-level to distinguish the subtle differences among fine-grained classes. In this paper, we propose a bag-level top-down attention framework, which could tackle label noises and mine subtle cues simultaneously and integrally. Specifically, our method first extracts high-level semantic information from a bag of images belonging to the same class, and then uses the bag-level information to mine discriminative regions in various scales of each image. Besides, we propose to derive attention weights from attention maps to weight the bag-level fusion for a robust supervision. We also propose an attention loss on self-bag attention and cross-bag attention to facilitate the learning of valid attention. Extensive experiments on four WSL-FGVC datasets, i.e., Web-Aircraft, Web-Bird, Web-Car, and WebiNat-5089, demonstrate the effectiveness of our method against the state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量