Guanglei Sheng , Gang Hu , Xiaofeng Wang , Wei Chen , Jinling Jiang , Quanquan Xiao
{"title":"面向细粒度视觉分类的跨层频域特征交互感知融合","authors":"Guanglei Sheng , Gang Hu , Xiaofeng Wang , Wei Chen , Jinling Jiang , Quanquan Xiao","doi":"10.1016/j.inffus.2025.103788","DOIUrl":null,"url":null,"abstract":"<div><div>To solve the problem of similar categories being difficult to distinguish in Fine-Grained Visual Classification (FGVC) tasks, existing FGVC methods mainly extract spatially local detail features that are discriminative for classification. These local detail features usually consist of high-frequency information. Therefore, considering the importance of high-frequency information for FGVC, this paper introduces a Cross-layer Frequency-spatial Domain Feature Interaction Awareness Fusion (CFD-FIAF) method that consists of four main components. First, a Frequency-domain Feature Complementary Module (FFCM) is designed to reinforce the high-frequency detailed information lost during the layer-by-layer downsampling process of the backbone network. Then, to extract frequency-domain features of interest, a Frequency-domain Feature Awareness Module (FFAM) is proposed to enhance the representation of discriminative local features and maintain the global structure at each granularity. In addition, to solve the problem of category prediction inconsistency of features with different granularity, Graph Convolutional Network Feature Fusion Module (GCN-FFM) and Prediction Consistency Distillation Loss (PCDL) are proposed to enhance the high-level semantic feature representation by fusing different discriminative features at each granularity. Experimental results demonstrate that the proposed method achieves competitive performance on four standard fine-grained visual classification benchmarks. Notably, it also achieved 99.8 % accuracy on our in-house Bulbus Fritillaria (a traditional Chinese medicine) dataset, highlighting its potential for fine-grained classification of traditional Chinese medicines.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103788"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-layer frequency-spatial domain feature interaction awareness fusion for fine-grained visual classification\",\"authors\":\"Guanglei Sheng , Gang Hu , Xiaofeng Wang , Wei Chen , Jinling Jiang , Quanquan Xiao\",\"doi\":\"10.1016/j.inffus.2025.103788\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>To solve the problem of similar categories being difficult to distinguish in Fine-Grained Visual Classification (FGVC) tasks, existing FGVC methods mainly extract spatially local detail features that are discriminative for classification. These local detail features usually consist of high-frequency information. Therefore, considering the importance of high-frequency information for FGVC, this paper introduces a Cross-layer Frequency-spatial Domain Feature Interaction Awareness Fusion (CFD-FIAF) method that consists of four main components. First, a Frequency-domain Feature Complementary Module (FFCM) is designed to reinforce the high-frequency detailed information lost during the layer-by-layer downsampling process of the backbone network. Then, to extract frequency-domain features of interest, a Frequency-domain Feature Awareness Module (FFAM) is proposed to enhance the representation of discriminative local features and maintain the global structure at each granularity. In addition, to solve the problem of category prediction inconsistency of features with different granularity, Graph Convolutional Network Feature Fusion Module (GCN-FFM) and Prediction Consistency Distillation Loss (PCDL) are proposed to enhance the high-level semantic feature representation by fusing different discriminative features at each granularity. Experimental results demonstrate that the proposed method achieves competitive performance on four standard fine-grained visual classification benchmarks. Notably, it also achieved 99.8 % accuracy on our in-house Bulbus Fritillaria (a traditional Chinese medicine) dataset, highlighting its potential for fine-grained classification of traditional Chinese medicines.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"127 \",\"pages\":\"Article 103788\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525008504\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008504","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
To solve the problem of similar categories being difficult to distinguish in Fine-Grained Visual Classification (FGVC) tasks, existing FGVC methods mainly extract spatially local detail features that are discriminative for classification. These local detail features usually consist of high-frequency information. Therefore, considering the importance of high-frequency information for FGVC, this paper introduces a Cross-layer Frequency-spatial Domain Feature Interaction Awareness Fusion (CFD-FIAF) method that consists of four main components. First, a Frequency-domain Feature Complementary Module (FFCM) is designed to reinforce the high-frequency detailed information lost during the layer-by-layer downsampling process of the backbone network. Then, to extract frequency-domain features of interest, a Frequency-domain Feature Awareness Module (FFAM) is proposed to enhance the representation of discriminative local features and maintain the global structure at each granularity. In addition, to solve the problem of category prediction inconsistency of features with different granularity, Graph Convolutional Network Feature Fusion Module (GCN-FFM) and Prediction Consistency Distillation Loss (PCDL) are proposed to enhance the high-level semantic feature representation by fusing different discriminative features at each granularity. Experimental results demonstrate that the proposed method achieves competitive performance on four standard fine-grained visual classification benchmarks. Notably, it also achieved 99.8 % accuracy on our in-house Bulbus Fritillaria (a traditional Chinese medicine) dataset, highlighting its potential for fine-grained classification of traditional Chinese medicines.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.