Cross-layer frequency-spatial domain feature interaction awareness fusion for fine-grained visual classification

IF 15.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-09-29 DOI:10.1016/j.inffus.2025.103788

Guanglei Sheng , Gang Hu , Xiaofeng Wang , Wei Chen , Jinling Jiang , Quanquan Xiao

{"title":"Cross-layer frequency-spatial domain feature interaction awareness fusion for fine-grained visual classification","authors":"Guanglei Sheng , Gang Hu , Xiaofeng Wang , Wei Chen , Jinling Jiang , Quanquan Xiao","doi":"10.1016/j.inffus.2025.103788","DOIUrl":null,"url":null,"abstract":"<div><div>To solve the problem of similar categories being difficult to distinguish in Fine-Grained Visual Classification (FGVC) tasks, existing FGVC methods mainly extract spatially local detail features that are discriminative for classification. These local detail features usually consist of high-frequency information. Therefore, considering the importance of high-frequency information for FGVC, this paper introduces a Cross-layer Frequency-spatial Domain Feature Interaction Awareness Fusion (CFD-FIAF) method that consists of four main components. First, a Frequency-domain Feature Complementary Module (FFCM) is designed to reinforce the high-frequency detailed information lost during the layer-by-layer downsampling process of the backbone network. Then, to extract frequency-domain features of interest, a Frequency-domain Feature Awareness Module (FFAM) is proposed to enhance the representation of discriminative local features and maintain the global structure at each granularity. In addition, to solve the problem of category prediction inconsistency of features with different granularity, Graph Convolutional Network Feature Fusion Module (GCN-FFM) and Prediction Consistency Distillation Loss (PCDL) are proposed to enhance the high-level semantic feature representation by fusing different discriminative features at each granularity. Experimental results demonstrate that the proposed method achieves competitive performance on four standard fine-grained visual classification benchmarks. Notably, it also achieved 99.8 % accuracy on our in-house Bulbus Fritillaria (a traditional Chinese medicine) dataset, highlighting its potential for fine-grained classification of traditional Chinese medicines.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103788"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008504","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

To solve the problem of similar categories being difficult to distinguish in Fine-Grained Visual Classification (FGVC) tasks, existing FGVC methods mainly extract spatially local detail features that are discriminative for classification. These local detail features usually consist of high-frequency information. Therefore, considering the importance of high-frequency information for FGVC, this paper introduces a Cross-layer Frequency-spatial Domain Feature Interaction Awareness Fusion (CFD-FIAF) method that consists of four main components. First, a Frequency-domain Feature Complementary Module (FFCM) is designed to reinforce the high-frequency detailed information lost during the layer-by-layer downsampling process of the backbone network. Then, to extract frequency-domain features of interest, a Frequency-domain Feature Awareness Module (FFAM) is proposed to enhance the representation of discriminative local features and maintain the global structure at each granularity. In addition, to solve the problem of category prediction inconsistency of features with different granularity, Graph Convolutional Network Feature Fusion Module (GCN-FFM) and Prediction Consistency Distillation Loss (PCDL) are proposed to enhance the high-level semantic feature representation by fusing different discriminative features at each granularity. Experimental results demonstrate that the proposed method achieves competitive performance on four standard fine-grained visual classification benchmarks. Notably, it also achieved 99.8 % accuracy on our in-house Bulbus Fritillaria (a traditional Chinese medicine) dataset, highlighting its potential for fine-grained classification of traditional Chinese medicines.

查看原文本刊更多论文

面向细粒度视觉分类的跨层频域特征交互感知融合

为了解决细粒度视觉分类（FGVC）任务中相似类别难以区分的问题，现有的FGVC方法主要是提取空间局部具有判别性的细节特征进行分类。这些局部细节特征通常由高频信息组成。因此，考虑到高频信息对FGVC的重要性，本文引入了一种跨层频率-空间域特征交互感知融合（CFD-FIAF）方法，该方法由四个主要部分组成。首先，设计了频域特征互补模块（FFCM）来增强骨干网逐层下行采样过程中丢失的高频详细信息。然后，为了提取感兴趣的频域特征，提出了一种频域特征感知模块（FFAM）来增强判别性局部特征的表示，并在每个粒度上保持全局结构。此外，为了解决不同粒度特征的类别预测不一致问题，提出了图卷积网络特征融合模块（GCN-FFM）和预测一致性蒸馏损失（PCDL），通过融合不同粒度的不同判别特征来增强高级语义特征表示。实验结果表明，该方法在四个标准的细粒度视觉分类基准上取得了较好的性能。值得注意的是，它在我们内部的贝母（一种中药）数据集上也达到了99.8%的准确率，突出了它在中药细粒度分类方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.