Information Fusion最新文献

筛选
英文 中文
Modeling speaker-specific long-term context for emotion recognition in conversation 为谈话中情绪识别建立说话人特定的长期语境模型
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-10-01 DOI: 10.1016/j.inffus.2025.103785
Haifeng Chen , Jing Li , Yan Li , Jian Li , Lang He , Dongmei Jiang
{"title":"Modeling speaker-specific long-term context for emotion recognition in conversation","authors":"Haifeng Chen ,&nbsp;Jing Li ,&nbsp;Yan Li ,&nbsp;Jian Li ,&nbsp;Lang He ,&nbsp;Dongmei Jiang","doi":"10.1016/j.inffus.2025.103785","DOIUrl":"10.1016/j.inffus.2025.103785","url":null,"abstract":"<div><div>Emotion recognition in conversation (ERC) is essential for enabling empathetic responses and fostering harmonious human-computer interaction. Modeling speaker-specific temporal dependencies can enhance the capture of speaker-sensitive emotional representations, thereby improving the understanding of emotional dynamics among speakers within a conversation. However, prior research has primarily focused on information available during speaking moments, neglecting contextual cues during silent moments, leading to incomplete and discontinuous representation of each speaker’s emotional context. This study addresses these limitations by proposing a novel framework named the Speaker-specific Long-term Context Encoding Network (SLCNet) for the ERC task. SLCNet is designed to capture the complete speaker-specific long-term context, including both speaking and non-speaking moments. Specifically, an attention-based multimodal fusion network is first employed to dynamically focus on key modalities for effective multimodal fusion. Then, two well-designed graph neural networks are utilized for feature completion by leveraging intra-speaker temporal context and inter-speaker interaction influence, respectively. Finally, a shared LSTM models the temporally complete and speaker-sensitive context for each speaker. The proposed SLCNet is jointly optimized for multiple speakers and trained in an end-to-end manner. Extensive experiments on benchmark datasets demonstrate the superior performance of SLCNet and its ability to effectively complete emotional representations during silent moments, highlighting its potential to advance ERC research.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103785"},"PeriodicalIF":15.5,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LOBSTER: Bilateral global semantic enhancement for multimedia recommendation 多媒体推荐的双边全局语义增强
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-30 DOI: 10.1016/j.inffus.2025.103778
Jinfeng Xu , Zheyu Chen , Wei Wang , Xiping Hu , Jiyi Liu , Edith C.H. Ngai
{"title":"LOBSTER: Bilateral global semantic enhancement for multimedia recommendation","authors":"Jinfeng Xu ,&nbsp;Zheyu Chen ,&nbsp;Wei Wang ,&nbsp;Xiping Hu ,&nbsp;Jiyi Liu ,&nbsp;Edith C.H. Ngai","doi":"10.1016/j.inffus.2025.103778","DOIUrl":"10.1016/j.inffus.2025.103778","url":null,"abstract":"<div><div>Multimedia information floods the Internet, subtly influencing human society. Combining multimedia information to alleviate the data sparsity problem is a popular way within the rapid development of recommender systems. However, many studies reveal that multimodal information can introduce cross-modality noise in some cases. A feasible solution to alleviate cross-modality noises is to enhance the common information among modalities. Recent advanced works enhance modality common information between users (via user-user graphs) or items (via item-item graphs) using extra homogeneous graphs. However, these additional homogeneous graph structures will inevitably bring huge computational costs. To better extract common information among modalities while reducing computational costs, we propose a bi<u>L</u>ateral gl<u>OB</u>al <u>S</u>eman<u>T</u>ic <u>E</u>nhancement for multimedia <u>R</u>ecommendation, which is called LOBSTER. Specifically, LOBSTER constructs two global semantic spaces for user and item representations, enhances global/common semantic features on both the user and item sides through additional learnable representations shared across multiple modalities. LOBSTER further incorporates a layer-refined Graph Convolutional Network (GCN) and a dynamic optimization to alleviate the over-smoothing problem and adjust attention levels for different modalities. Extensive experiments on three real-world datasets demonstrate that LOBSTER achieves competitive or superior performance compared to models incorporating homogeneous graphs, while providing an average 2.45<span><math><mo>×</mo></math></span> speedup and a 60.26 % reduction in memory usage. Our code is available at <span><span>https://github.com/Jinfeng-Xu/LOBSTER</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103778"},"PeriodicalIF":15.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning 基于知识图强化学习的抗幻觉多模态内容生成
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-30 DOI: 10.1016/j.inffus.2025.103783
Liang Zeng , Xinyi Lin , Shanping Yu
{"title":"Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning","authors":"Liang Zeng ,&nbsp;Xinyi Lin ,&nbsp;Shanping Yu","doi":"10.1016/j.inffus.2025.103783","DOIUrl":"10.1016/j.inffus.2025.103783","url":null,"abstract":"<div><div>Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103783"},"PeriodicalIF":15.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CRFFNet: A cross-view reprojection based feature fusion network for fine-grained building segmentation using satellite-view and street-view data CRFFNet:基于交叉视图重投影的特征融合网络,用于使用卫星视图和街景数据进行细粒度建筑物分割
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-30 DOI: 10.1016/j.inffus.2025.103795
Jinhua Yu , Junyan Ye , Yi Lin, Weijia Li
{"title":"CRFFNet: A cross-view reprojection based feature fusion network for fine-grained building segmentation using satellite-view and street-view data","authors":"Jinhua Yu ,&nbsp;Junyan Ye ,&nbsp;Yi Lin,&nbsp;Weijia Li","doi":"10.1016/j.inffus.2025.103795","DOIUrl":"10.1016/j.inffus.2025.103795","url":null,"abstract":"<div><div>Fine-grained building attribute segmentation is crucial for rapidly acquiring urban geographic information and understanding urban development dynamics. To achieve a comprehensive perception of buildings, fusing cross-view data, which combines the wide coverage of satellite-view imagery with the detailed observations of street-view images, has become increasingly important. However, existing methods still struggle to effectively mitigate feature discrepancies across different views during cross-view fusion. To address this challenge, we propose the CRFFNet, a Cross-view Reprojection-based Feature Fusion Network for fine-grained building attribute segmentation. CRFFNet eliminates the perspective differences between satellite-view (satellite image and map data) and street-view features, enabling high-precision building attribute segmentation. Specifically, we introduce a deformable module to reduce target distortions in panoramic street-view images, and develop an Explicit Geometric Reprojection (EGR) module, which leverages explicit BEV geometric priors to reproject street-view features onto the satellite-view plane without requiring complex parameter inputs or depth information. To support evaluation, we construct two new datasets, Washington and Seattle, which include satellite imagery, map data, and panoramic street-view images, serving as benchmarks for cross-view, fine-grained building attribute segmentation. Extensive experiments conducted on these datasets, as well as on the public OmniCity and Brooklyn datasets, demonstrate that CRFFNet achieves mIoU improvements of 1.02% on Washington, 8.12% on Seattle, 2.29% on OmniCity, and 2.87% on Brooklyn compared to the second-best method. These improvements demonstrate the potential of our CRFFNet for applications involving large-scale multi-source data, contributing to more comprehensive urban analysis and planning.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103795"},"PeriodicalIF":15.5,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Survey of Neural Network Approaches to Target Tracking with an Emphasis on Interpretability 以可解释性为重点的目标跟踪神经网络方法综述
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-29 DOI: 10.1016/j.inffus.2025.103789
Marco Mari, Lauro Snidaro
{"title":"Survey of Neural Network Approaches to Target Tracking with an Emphasis on Interpretability","authors":"Marco Mari,&nbsp;Lauro Snidaro","doi":"10.1016/j.inffus.2025.103789","DOIUrl":"10.1016/j.inffus.2025.103789","url":null,"abstract":"<div><div>This survey examines recent advances in target tracking methods that incorporate neural networks, with a particular emphasis on their application to complex and dynamic tracking scenarios. While classical model-based approaches have traditionally dominated the field, they often struggle with nonlinear dynamics and unpredictable maneuvers. Conversely, learning-based methods, particularly those employing neural architectures, present compelling alternatives by leveraging data-driven representations and adaptive capabilities. This work provides a concise overview of conventional tracking frameworks to contextualize the evolution of neural approaches. A central contribution of the survey is a novel classification of neural tracking methods based on their level of interpretability, offering a unique perspective on how transparency and explainability are addressed in the design of modern tracking systems. The review synthesizes trends across a broad range of applications, compares methodological trade-offs, and identifies key challenges and open research directions, particularly in balancing performance with trustworthiness in real-world deployment.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103789"},"PeriodicalIF":15.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145229531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised coefficient learning framework for variational pansharpening 变分泛锐化的无监督系数学习框架
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-29 DOI: 10.1016/j.inffus.2025.103790
Jin-Liang Xiao , Ting-Zhu Huang , Liang-Jian Deng , Huidong Jiang , Qibin Zhao , Gemine Vivone
{"title":"Unsupervised coefficient learning framework for variational pansharpening","authors":"Jin-Liang Xiao ,&nbsp;Ting-Zhu Huang ,&nbsp;Liang-Jian Deng ,&nbsp;Huidong Jiang ,&nbsp;Qibin Zhao ,&nbsp;Gemine Vivone","doi":"10.1016/j.inffus.2025.103790","DOIUrl":"10.1016/j.inffus.2025.103790","url":null,"abstract":"<div><div>Pansharpening combines a panchromatic (PAN) image and a low-resolution multispectral (LRMS) image to generate a high-resolution multispectral (HRMS) image. Variational optimization (VO) approaches have garnered significant attention due to their data-independent generalization capabilities and robust performance. However, these methods often face challenges in accurately estimating coefficients, a critical factor influencing the quality of the final results. Existing VO approaches typically perform linear coefficient estimation at a reduced-resolution scale, which limits their effectiveness and adaptability. To address these limitations, we propose a novel VO-based method under an unsupervised coefficient learning (UCL) framework. This approach retains the generalization ability of VO while enabling precise coefficient estimation through a nonlinear, full-resolution learning technique. Furthermore, the UCL framework eliminates the need for additional training data beyond the input pair (i.e., a PAN image and a LRMS image), offering a flexible and extensible solution applicable to other traditional methods based on coefficient estimation. Qualitative and quantitative experimental assessments on reduced- and full-resolution datasets demonstrate that the proposed method achieves state-of-the-art performance. The code is available at <span><span>https://github.com/Jin-liangXiao/UCL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103790"},"PeriodicalIF":15.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Traveller: Travel-pattern aware trajectory generation via autoregressive diffusion models 旅行者:通过自回归扩散模型生成旅行模式感知轨迹
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-29 DOI: 10.1016/j.inffus.2025.103766
Yuxiao Luo , Songming Zhang , Kang Liu , Yang Xu , Ling Yin
{"title":"Traveller: Travel-pattern aware trajectory generation via autoregressive diffusion models","authors":"Yuxiao Luo ,&nbsp;Songming Zhang ,&nbsp;Kang Liu ,&nbsp;Yang Xu ,&nbsp;Ling Yin","doi":"10.1016/j.inffus.2025.103766","DOIUrl":"10.1016/j.inffus.2025.103766","url":null,"abstract":"<div><div>Trajectory Generation (TG) enables realistic simulation of individual movements for applications such as urban management, transportation planning, epidemic control, and privacy-preserving mobility analysis. However, existing TG methods, particularly unconditional diffusion models, struggle with spatiotemporal fidelity as they often overlook some travel patterns that are critical in an individual’s mobility behavior, such as recurrent location visits, movement scope, and temporal regularities. In this work, we propose the Autoregressive Diffusion Model for Travel-Pattern Aware Trajectory Generation (<strong>Traveller</strong>), a novel approach that integrates autoregressive travel-pattern modeling (AR-TempPlan) with diffusion-based trajectory generation (TravCond-Diff) to produce realistic and context-aware movement patterns. By leveraging the spatial anchor and temporal modes of visiting different locations, we derive an individual’s particular travel pattern as spatiotemporal constraints for guided trajectory generation. Building on this, AR-TempPlan generates a mask location sequence as the temporal modes, planning location transitions over time, while TravCond-Diff leverages this planning signal and home location, the spatial anchor, to guide spatial generation through a discrete diffusion process. Experiments on real-world datasets demonstrate that Traveller with the dual guidance mechanism enables the production of high-fidelity and individual trajectories that effectively capture complex human mobility behaviors while preserving privacy. The code and data are available at <span><span>https://github.com/YuxiaoLuo0013/Traveller</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103766"},"PeriodicalIF":15.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Label-conditioned multi-GAN fusion: A robust data augmentation strategy for medical image segmentation 标签条件多gan融合:医学图像分割的鲁棒数据增强策略
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-29 DOI: 10.1016/j.inffus.2025.103773
Junxin Chen , Renlong Zhang , Zhiheng Ye , Wen-Long Shang , Sibo Qiao , Zhihan Lyu
{"title":"Label-conditioned multi-GAN fusion: A robust data augmentation strategy for medical image segmentation","authors":"Junxin Chen ,&nbsp;Renlong Zhang ,&nbsp;Zhiheng Ye ,&nbsp;Wen-Long Shang ,&nbsp;Sibo Qiao ,&nbsp;Zhihan Lyu","doi":"10.1016/j.inffus.2025.103773","DOIUrl":"10.1016/j.inffus.2025.103773","url":null,"abstract":"<div><div>The performance of deep learning for medical image segmentation heavily relies on the quantity and quality of training data. However, lack of high-quality labeled data remains a critical bottleneck. It requires several hours of radiologist to annotate the organs in a CT/MRI. In addition, rare disease generally has limited samples for training, while anatomical boundary blurring and intra-class intensity heterogeneity also yield data scarcity on the other hand. To this end, this paper proposes a label-guided multi-GAN collaborative framework for medical image augmentation. Leveraging existing labels as conditional inputs, three GAN variants (Pix2pix, Pix2pixHD, SPADE) are trained in parallel to synthesize images in target domain. This design highlights anatomical regions, improves image quality, and enhances data diversity and quality at the same time. Experimental results on three modalities demonstrate that our approach is able to significantly boost segmentation performance across various segmentation networks.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103773"},"PeriodicalIF":15.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145221722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GF-SVD: Global knowledge-infused singular value decomposition of large language models GF-SVD:大型语言模型的全局知识注入奇异值分解
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-29 DOI: 10.1016/j.inffus.2025.103774
Xiangxiang Gao, Weisheng Xie , Yuhan Lin, Chen Hang, Hongyang Han, Xiaolong Xu, Bo Liu
{"title":"GF-SVD: Global knowledge-infused singular value decomposition of large language models","authors":"Xiangxiang Gao,&nbsp;Weisheng Xie ,&nbsp;Yuhan Lin,&nbsp;Chen Hang,&nbsp;Hongyang Han,&nbsp;Xiaolong Xu,&nbsp;Bo Liu","doi":"10.1016/j.inffus.2025.103774","DOIUrl":"10.1016/j.inffus.2025.103774","url":null,"abstract":"<div><div>Singular Value Decomposition (SVD) provides an efficient solution for compressing and accelerating Large Language Models (LLMs) without retraining or specialized hardware. Despite its advantages, current SVD-based LLMs compression methods suffer from three critical limitations that degrade performance: (1) Cross-domain knowledge preservation is compromised, (2) Layer-isolated decomposition disrupts inter-layer information flow, and (3) Gradual knowledge erosion caused by aggressive truncation of singular values and corresponding vectors. To overcome these, we propose GF-SVD, a novel framework that integrates: <strong>(1) Hierarchical Knowledge Infusion:</strong> Enhances dataset diversity by integrating hierarchical knowledge to improve cross-domain generalization, <strong>(2) Global Information Integration:</strong> Captures inter-layer dependencies and broader context via weighted aggregation of multi-layer feature matrices, and <strong>(3) Knowledge-Enhanced Truncation and Updating:</strong> Truncates and updates weights with infused dataset to mitigate knowledge erosion. Extensive experiments demonstrate that GF-SVD surpasses existing SVD-based LLMs compression methods across diverse tasks, including knowledge-intensive question answering, complex reasoning, physical system, and mathematical problem-solving. Notably, GF-SVD can also improve inference speed by 2.36x on GPUs and 2.74x on CPUs at 60 % compression ratio.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103774"},"PeriodicalIF":15.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145229528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-layer frequency-spatial domain feature interaction awareness fusion for fine-grained visual classification 面向细粒度视觉分类的跨层频域特征交互感知融合
IF 15.5 1区 计算机科学
Information Fusion Pub Date : 2025-09-29 DOI: 10.1016/j.inffus.2025.103788
Guanglei Sheng , Gang Hu , Xiaofeng Wang , Wei Chen , Jinling Jiang , Quanquan Xiao
{"title":"Cross-layer frequency-spatial domain feature interaction awareness fusion for fine-grained visual classification","authors":"Guanglei Sheng ,&nbsp;Gang Hu ,&nbsp;Xiaofeng Wang ,&nbsp;Wei Chen ,&nbsp;Jinling Jiang ,&nbsp;Quanquan Xiao","doi":"10.1016/j.inffus.2025.103788","DOIUrl":"10.1016/j.inffus.2025.103788","url":null,"abstract":"<div><div>To solve the problem of similar categories being difficult to distinguish in Fine-Grained Visual Classification (FGVC) tasks, existing FGVC methods mainly extract spatially local detail features that are discriminative for classification. These local detail features usually consist of high-frequency information. Therefore, considering the importance of high-frequency information for FGVC, this paper introduces a Cross-layer Frequency-spatial Domain Feature Interaction Awareness Fusion (CFD-FIAF) method that consists of four main components. First, a Frequency-domain Feature Complementary Module (FFCM) is designed to reinforce the high-frequency detailed information lost during the layer-by-layer downsampling process of the backbone network. Then, to extract frequency-domain features of interest, a Frequency-domain Feature Awareness Module (FFAM) is proposed to enhance the representation of discriminative local features and maintain the global structure at each granularity. In addition, to solve the problem of category prediction inconsistency of features with different granularity, Graph Convolutional Network Feature Fusion Module (GCN-FFM) and Prediction Consistency Distillation Loss (PCDL) are proposed to enhance the high-level semantic feature representation by fusing different discriminative features at each granularity. Experimental results demonstrate that the proposed method achieves competitive performance on four standard fine-grained visual classification benchmarks. Notably, it also achieved 99.8 % accuracy on our in-house Bulbus Fritillaria (a traditional Chinese medicine) dataset, highlighting its potential for fine-grained classification of traditional Chinese medicines.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103788"},"PeriodicalIF":15.5,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145269565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信