用于细粒度视觉分类的上下文语义质量感知网络

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2025-07-01 DOI:10.1016/j.patcog.2025.112033

Qin Xu , Sitong Li , Jiahui Wang , Bo Jiang , Bin Luo , Jinhui Tang

{"title":"用于细粒度视觉分类的上下文语义质量感知网络","authors":"Qin Xu , Sitong Li , Jiahui Wang , Bo Jiang , Bin Luo , Jinhui Tang","doi":"10.1016/j.patcog.2025.112033","DOIUrl":null,"url":null,"abstract":"<div><div>Exploring and mining subtle yet distinctive features between sub-categories with similar appearances is crucial for fine-grained visual categorization (FGVC). However, the existing FGVC methods cannot mine discriminative features from low-quality samples, leading to a significant decline in performance. To address this issue, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC. Specifically, to assess and enhance the quality of multi-granularity visual representations, we propose the Multi-level Semantic Quality Evaluation (MSQE) module, composed of the Quality Probing (QP) classifier. To alleviate the scale confusion problems and accurately identify the local distinctive regions, the part navigator is developed. Moreover, the Multi-part and Multi-scale Cross-Attention (MMCA) module is designed to model the spatial contextual relationship between rich part descriptors and global semantics, thus capturing more discriminative details within the object. Finally, the context-aware features from MMCA and semantically enhanced features from MSQE are fed into the corresponding QP classifiers to evaluate the quality in real time, further boosting the discriminability. Comprehensive experiments on four popular and highly competitive datasets demonstrate the superiority of the proposed CSQA-Net in comparison with the state-of-the-art methods. Code is available at <span><span>https://github.com/zmisiter/CSQA-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112033"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Context-Semantic Quality Awareness Network for fine-grained visual categorization\",\"authors\":\"Qin Xu , Sitong Li , Jiahui Wang , Bo Jiang , Bin Luo , Jinhui Tang\",\"doi\":\"10.1016/j.patcog.2025.112033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Exploring and mining subtle yet distinctive features between sub-categories with similar appearances is crucial for fine-grained visual categorization (FGVC). However, the existing FGVC methods cannot mine discriminative features from low-quality samples, leading to a significant decline in performance. To address this issue, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC. Specifically, to assess and enhance the quality of multi-granularity visual representations, we propose the Multi-level Semantic Quality Evaluation (MSQE) module, composed of the Quality Probing (QP) classifier. To alleviate the scale confusion problems and accurately identify the local distinctive regions, the part navigator is developed. Moreover, the Multi-part and Multi-scale Cross-Attention (MMCA) module is designed to model the spatial contextual relationship between rich part descriptors and global semantics, thus capturing more discriminative details within the object. Finally, the context-aware features from MMCA and semantically enhanced features from MSQE are fed into the corresponding QP classifiers to evaluate the quality in real time, further boosting the discriminability. Comprehensive experiments on four popular and highly competitive datasets demonstrate the superiority of the proposed CSQA-Net in comparison with the state-of-the-art methods. Code is available at <span><span>https://github.com/zmisiter/CSQA-Net</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"170 \",\"pages\":\"Article 112033\"},\"PeriodicalIF\":7.6000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325006934\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325006934","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

探索和挖掘具有相似外观的子类别之间微妙而独特的特征对于细粒度视觉分类（FGVC）至关重要。然而，现有的FGVC方法不能从低质量样本中挖掘判别特征，导致性能显著下降。为了解决这个问题，我们提出了一个弱监督的上下文语义质量感知网络（CSQA-Net）。具体来说，为了评估和提高多粒度视觉表示的质量，我们提出了多层次语义质量评估（MSQE）模块，该模块由质量探测（QP）分类器组成。为了缓解尺度混淆问题，准确识别局部特色区域，开发了局部导航器。此外，设计了多部分多尺度交叉注意（Multi-part and Multi-scale Cross-Attention， MMCA）模块，对丰富的部分描述符和全局语义之间的空间上下文关系进行建模，从而捕获对象内部更多的判别细节。最后，将来自MMCA的上下文感知特征和来自MSQE的语义增强特征馈送到相应的QP分类器中，实时评估质量，进一步提高可判别性。在四个流行且竞争激烈的数据集上进行的综合实验表明，与最先进的方法相比，所提出的CSQA-Net具有优越性。代码可从https://github.com/zmisiter/CSQA-Net获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Context-Semantic Quality Awareness Network for fine-grained visual categorization

查看原文本刊更多论文

Context-Semantic Quality Awareness Network for fine-grained visual categorization

Exploring and mining subtle yet distinctive features between sub-categories with similar appearances is crucial for fine-grained visual categorization (FGVC). However, the existing FGVC methods cannot mine discriminative features from low-quality samples, leading to a significant decline in performance. To address this issue, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC. Specifically, to assess and enhance the quality of multi-granularity visual representations, we propose the Multi-level Semantic Quality Evaluation (MSQE) module, composed of the Quality Probing (QP) classifier. To alleviate the scale confusion problems and accurately identify the local distinctive regions, the part navigator is developed. Moreover, the Multi-part and Multi-scale Cross-Attention (MMCA) module is designed to model the spatial contextual relationship between rich part descriptors and global semantics, thus capturing more discriminative details within the object. Finally, the context-aware features from MMCA and semantically enhanced features from MSQE are fed into the corresponding QP classifiers to evaluate the quality in real time, further boosting the discriminability. Comprehensive experiments on four popular and highly competitive datasets demonstrate the superiority of the proposed CSQA-Net in comparison with the state-of-the-art methods. Code is available at https://github.com/zmisiter/CSQA-Net.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.