综合处理噪声和细微差异的网络监督细粒度分类

Junjie Chen;Jiebin Yan;Yuming Fang;Li Niu
{"title":"综合处理噪声和细微差异的网络监督细粒度分类","authors":"Junjie Chen;Jiebin Yan;Yuming Fang;Li Niu","doi":"10.1109/TIP.2025.3562740","DOIUrl":null,"url":null,"abstract":"Webly-supervised fine-grained visual classification (WSL-FGVC) aims to learn similar sub-classes from cheap web images, which suffers from two major issues: label noises in web images and subtle differences among fine-grained classes. However, existing methods for WSL-FGVC only focus on suppressing noise at image-level, but neglect to mine cues at pixel-level to distinguish the subtle differences among fine-grained classes. In this paper, we propose a bag-level top-down attention framework, which could tackle label noises and mine subtle cues simultaneously and integrally. Specifically, our method first extracts high-level semantic information from a bag of images belonging to the same class, and then uses the bag-level information to mine discriminative regions in various scales of each image. Besides, we propose to derive attention weights from attention maps to weight the bag-level fusion for a robust supervision. We also propose an attention loss on self-bag attention and cross-bag attention to facilitate the learning of valid attention. Extensive experiments on four WSL-FGVC datasets, i.e., Web-Aircraft, Web-Bird, Web-Car, and WebiNat-5089, demonstrate the effectiveness of our method against the state-of-the-art methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2641-2653"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Webly Supervised Fine-Grained Classification by Integrally Tackling Noises and Subtle Differences\",\"authors\":\"Junjie Chen;Jiebin Yan;Yuming Fang;Li Niu\",\"doi\":\"10.1109/TIP.2025.3562740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Webly-supervised fine-grained visual classification (WSL-FGVC) aims to learn similar sub-classes from cheap web images, which suffers from two major issues: label noises in web images and subtle differences among fine-grained classes. However, existing methods for WSL-FGVC only focus on suppressing noise at image-level, but neglect to mine cues at pixel-level to distinguish the subtle differences among fine-grained classes. In this paper, we propose a bag-level top-down attention framework, which could tackle label noises and mine subtle cues simultaneously and integrally. Specifically, our method first extracts high-level semantic information from a bag of images belonging to the same class, and then uses the bag-level information to mine discriminative regions in various scales of each image. Besides, we propose to derive attention weights from attention maps to weight the bag-level fusion for a robust supervision. We also propose an attention loss on self-bag attention and cross-bag attention to facilitate the learning of valid attention. Extensive experiments on four WSL-FGVC datasets, i.e., Web-Aircraft, Web-Bird, Web-Car, and WebiNat-5089, demonstrate the effectiveness of our method against the state-of-the-art methods.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"2641-2653\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10977734/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10977734/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

web监督细粒度视觉分类(web -supervised fine-grained visual classification, WSL-FGVC)旨在从廉价的web图像中学习相似的子类,该方法存在两个主要问题:web图像中的标签噪声和细粒度类之间的细微差异。然而,现有的WSL-FGVC方法只关注图像级的噪声抑制,而忽略了在像素级挖掘线索,以区分细粒度类之间的细微差异。在本文中,我们提出了一个袋级自上而下的注意框架,该框架可以同时完整地处理标签噪声和挖掘微妙线索。具体来说,我们的方法首先从属于同一类别的图像中提取高级语义信息,然后使用袋级信息挖掘每个图像的不同尺度的判别区域。此外,我们提出从注意图中导出注意权值来对袋级融合进行加权,以实现鲁棒监督。我们还提出了自袋注意和跨袋注意的注意缺失,以促进有效注意的学习。在Web-Aircraft、Web-Bird、Web-Car和WebiNat-5089四个WSL-FGVC数据集上进行的大量实验表明,我们的方法与最先进的方法相比是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Webly Supervised Fine-Grained Classification by Integrally Tackling Noises and Subtle Differences
Webly-supervised fine-grained visual classification (WSL-FGVC) aims to learn similar sub-classes from cheap web images, which suffers from two major issues: label noises in web images and subtle differences among fine-grained classes. However, existing methods for WSL-FGVC only focus on suppressing noise at image-level, but neglect to mine cues at pixel-level to distinguish the subtle differences among fine-grained classes. In this paper, we propose a bag-level top-down attention framework, which could tackle label noises and mine subtle cues simultaneously and integrally. Specifically, our method first extracts high-level semantic information from a bag of images belonging to the same class, and then uses the bag-level information to mine discriminative regions in various scales of each image. Besides, we propose to derive attention weights from attention maps to weight the bag-level fusion for a robust supervision. We also propose an attention loss on self-bag attention and cross-bag attention to facilitate the learning of valid attention. Extensive experiments on four WSL-FGVC datasets, i.e., Web-Aircraft, Web-Bird, Web-Car, and WebiNat-5089, demonstrate the effectiveness of our method against the state-of-the-art methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信