Information FusionPub Date : 2025-02-28DOI: 10.1016/j.inffus.2025.103038
Thomas Gorges , Teresa Scholz , Stefan Saloman , Mathias Zinnen , Juliane Hoffmann , Nora Gourmelon , Andreas Maier , Sebastian Hettenkofer , Vincent Christlein
{"title":"PASAL: Progress- and sparsity-aware loss balancing for heterogeneous dataset fusion","authors":"Thomas Gorges , Teresa Scholz , Stefan Saloman , Mathias Zinnen , Juliane Hoffmann , Nora Gourmelon , Andreas Maier , Sebastian Hettenkofer , Vincent Christlein","doi":"10.1016/j.inffus.2025.103038","DOIUrl":"10.1016/j.inffus.2025.103038","url":null,"abstract":"<div><div>Machine learning has seen widespread application in many areas. Despite theoretical advancements, the demand for qualitative and extensive data foundations is increasing. Real-world datasets are often small and combining them is challenging due to the resulting sparsity and heterogeneity. Existing combination techniques merge datasets into a common space before training, causing drawbacks such as data loss and distortion of annotations. To address this, we fuse heterogeneous datasets by jointly training dataset-specific weighted sub-networks. Balancing losses from heterogeneous data sources is challenging, as current techniques are inadequate. We propose a novel progress- and sparsity-aware loss balancing method (PASAL), which adaptively balances sub-network losses based on individual learning progress and sparsity. As an example, we present the application of PASAL to the olfaction domain, where predicting smell properties based on molecular structure is difficult due to subjective impressions, typically limited data, and a lack of unified datasets. By evaluating PASAL on the DREAM Olfaction Prediction Challenge, we improve the current state-of-the-art method from a Z-Score of 9.92 to 10.10. Furthermore, by treating our AI as an annotator, we surpass human performance in the odor and pleasantness categories with statistical significance. Our findings are supported by a feature analysis, indicating that our heterogeneous combination methodology enhances odor prediction.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103038"},"PeriodicalIF":14.7,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143593045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-02-28DOI: 10.1016/j.inffus.2025.103030
Yan Wang , Henry K. Chu , Yuxiang Sun
{"title":"PEAFusion: Parameter-efficient Adaptation for RGB-Thermal fusion-based semantic segmentation","authors":"Yan Wang , Henry K. Chu , Yuxiang Sun","doi":"10.1016/j.inffus.2025.103030","DOIUrl":"10.1016/j.inffus.2025.103030","url":null,"abstract":"<div><div>RGB-Thermal (RGB-T) semantic segmentation has attracted great attention in the research community of autonomous driving. Full fine-tuning pre-trained networks is a common strategy in RGB-T semantic segmentation. However, as model size grows, updating all parameters becomes expensive and impractical, which hinders the wide applications of pre-trained networks despite their effectiveness. To efficiently adapt pre-trained single-modality networks to the multi-modal RGB-T task, we design a module named multi-view adapter-pair. The multi-view adapter-pair bridges the gap between pre-trained features and the features required for RGB-T semantic segmentation. It achieves this by approximating high-dimensional updates to the hidden state during full fine-tuning within low-dimensional spaces. Moreover, we propose cross-modal self-attention, constructed using the self-attention operations in pre-trained transformer models. The cross-modal self-attention is designed to fuse RGB and thermal data by expanding the self-attention mechanism in the pre-trained model from a single modality to multiple modalities. Due to the permutation invariance of the attention mechanism and the differences between the two modalities, we introduce modality bias to guide the attention mechanism in learning dependencies inter- and intra-the two modalities. Leveraging these innovations, our network outperforms state-of-the-art methods on the MFNet dataset, as well as the FMB dataset and PST900 dataset, while maintaining parameter efficiency.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103030"},"PeriodicalIF":14.7,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143601457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-02-27DOI: 10.1016/j.inffus.2025.103041
Han Zhang, Xuening Bai, Guangyao Hou, Xiongwen Quan
{"title":"A multi-step interaction network for multi-class classification based on OCT and OCTA images","authors":"Han Zhang, Xuening Bai, Guangyao Hou, Xiongwen Quan","doi":"10.1016/j.inffus.2025.103041","DOIUrl":"10.1016/j.inffus.2025.103041","url":null,"abstract":"<div><div>OCT and OCTA images are important basis for diagnosing multiple ophthalmic diseases. However, it is a challenge to fuse these two modalities with high redundancy and simultaneous projection of 3D data for multi-class classification. This paper proposes a novel Multi-step Interaction Network (MINet) for projection and feature fusion as a unified framework, where OCT and OCTA images deeply interact for a seven-class classification task of ophthalmic diseases. Firstly, we design a Multi-modal Interaction Projection Module to iterate projection and shallow information interaction for effective feature selection. Secondly, a Feature Redundancy Removal Module compares the feature difference information between the two modalities to eliminate redundancy. Thirdly, the Feature Interaction Fusion Module utilizes the differential modal information from the backbone CNN to perform respective modal attention and achieve interactive fusion. Finally, a classifier module generates multi-class classification results. Experimental results show that our method achieved Accuracy of 0.8690, Precision of 0.6921, Recall of 0.7250, and F1 score of 0.7081 on the OCTA-500 dataset. Comparative experiments with other state-of-the-art methods for OCT image classification, along with ablation experiments, demonstrate the superior performance of MINet.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103041"},"PeriodicalIF":14.7,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143550294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-02-26DOI: 10.1016/j.inffus.2025.103039
Xin Zhou , Yongchao Zhang , Zheng Liu , Zeyu Jiang , Zhaohui Ren , Tianchuan Mi , Shihua Zhou
{"title":"IFIFusion: A independent feature information fusion model for surface defect detection","authors":"Xin Zhou , Yongchao Zhang , Zheng Liu , Zeyu Jiang , Zhaohui Ren , Tianchuan Mi , Shihua Zhou","doi":"10.1016/j.inffus.2025.103039","DOIUrl":"10.1016/j.inffus.2025.103039","url":null,"abstract":"<div><div>Existing surface-defect detection networks often rely on pre-trained classification models as a backbone to extract multi-scale features and establish decoded modules tailored for these features. However, since the classification model primarily captures identifiable class features in natural images, these decoded modules struggle to accurately delineate the precise spatial structures from the multi-scale feature maps of the backbone. Also, discrete feature contents from different modules lack the necessary fusion, and the architecture lacks the ability to accurately classify the defective object in images, resulting in lower detection performance. Therefore, this paper explored an Independent Feature Information Fusion model (IFIFusion) consisting of spatial multi-axis information fusion, independent classification & spatial information fusion, and multi-scale features fusion for defect detection tasks. This model fully explores the collaborative relationship between independent modules. Firstly, this work designs a Dual-Attention transformer backbone to extract classification features and a Light-Weight multi-scale Network (LWNet) parallel to the backbone to provide independent features for the proposed Spatial multi-Axis Fusion module (SAF). This study provides an independent feature fusion module (IFF), primarily consisting of the connector, convolution layers, batch normalization, ReLU, and interpolation method, to enrich discrete feature contents. Then, the SAF aggregates and fuses the discriminative spatial features from non-classification feature maps. Lastly, the paper fuses the feature maps at each scale to achieve effective coupling and interaction, enhancing the presentation of defective features. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art algorithms regarding visual quality and quantitative evaluations. The code is available at <span><span>https://github.com/zhx-hub/IFIFusion/tree/main</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103039"},"PeriodicalIF":14.7,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143550413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-02-26DOI: 10.1016/j.inffus.2025.103040
Zihao Cai , Zehui Xiao , Ming Lin , Zheqing Zhou , Jie Tao
{"title":"Event-triggered set-membership fusion estimation of multi-rate multi-sensor systems under multiple cyber attacks","authors":"Zihao Cai , Zehui Xiao , Ming Lin , Zheqing Zhou , Jie Tao","doi":"10.1016/j.inffus.2025.103040","DOIUrl":"10.1016/j.inffus.2025.103040","url":null,"abstract":"<div><div>The article concerns the multi-rate multi-sensor systems set-membership fusion estimation problem under multiple cyber attacks. In order to save limited communication resources, a novel adaptive event-triggered strategy is developed to control the frequency of information transmission. In contrast to conventional adaptive strategies, a transformation law is introduced to establish the triggering condition, thereby effectively avoiding triggering behaviors caused by small fluctuations after the error has converged. In addition, a hybrid approach is proposed to address the issue of rate inconsistency among various components, significantly improving both the efficiency and accuracy of the estimation algorithm. Then, a cryptography-based privacy protection scheme is presented to defend against deception attacks and replay attacks. By incorporating the concept of set-membership estimation, an optimization problem for secure fusion estimation is formulated. In light of this, an online recursive algorithm is proposed to continuously obtain the weighting coefficients and the optimal ellipsoid set, ensuring that the error remains confined within the desired ellipsoid. Finally, the superiority and feasibility of the proposed scheme are validated through a simulation example and a practical experiment.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103040"},"PeriodicalIF":14.7,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143550300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-02-25DOI: 10.1016/j.inffus.2025.103032
Andreas Holzinger , Niko Lukač , Dzemail Rozajac , Emile Johnston , Veljka Kocic , Bernhard Hoerl , Christoph Gollob , Arne Nothdurft , Karl Stampfer , Stefan Schweng , Javier Del Ser
{"title":"Enhancing trust in automated 3D point cloud data interpretation through explainable counterfactuals","authors":"Andreas Holzinger , Niko Lukač , Dzemail Rozajac , Emile Johnston , Veljka Kocic , Bernhard Hoerl , Christoph Gollob , Arne Nothdurft , Karl Stampfer , Stefan Schweng , Javier Del Ser","doi":"10.1016/j.inffus.2025.103032","DOIUrl":"10.1016/j.inffus.2025.103032","url":null,"abstract":"<div><div>This paper introduces a novel framework for augmenting explainability in the interpretation of point cloud data by fusing expert knowledge with counterfactual reasoning. Given the complexity and voluminous nature of point cloud datasets, derived predominantly from LiDAR and 3D scanning technologies, achieving interpretability remains a significant challenge, particularly in smart cities, smart agriculture, and smart forestry. This research posits that integrating expert knowledge with counterfactual explanations – speculative scenarios illustrating how altering input data points could lead to different outcomes – can significantly reduce the opacity of deep learning models processing point cloud data. The proposed optimization-driven framework utilizes expert-informed ad-hoc perturbation techniques to generate meaningful counterfactual scenarios when employing state-of-the-art deep learning architectures. The optimization process minimizes a multi-criteria objective comprising counterfactual metrics such as similarity, validity, and sparsity, which are specifically tailored for point cloud datasets. These metrics provide a quantitative lens for evaluating the interpretability of the counterfactuals. Furthermore, the proposed framework allows for the definition of explicit interpretable counterfactual perturbations at its core, thereby involving the audience of the model in the counterfactual generation pipeline and ultimately, improving their overall trust in the process. Results demonstrate a notable improvement in both the interpretability of the model’s decisions and the actionable insights delivered to end-users. Additionally, the study explores the role of counterfactual reasoning, coupled with expert input, in enhancing trustworthiness and enabling human-in-the-loop decision-making processes. By bridging the gap between complex data interpretations and user comprehension, this research advances the field of explainable AI, contributing to the development of transparent, accountable, and human-centered artificial intelligence systems.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103032"},"PeriodicalIF":14.7,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-02-25DOI: 10.1016/j.inffus.2025.103034
Zhentian Bian, Bin Yao, Qing Li
{"title":"SPAFPN: Wear a multi-scale feature fusion scarf around neck for real-time object detector","authors":"Zhentian Bian, Bin Yao, Qing Li","doi":"10.1016/j.inffus.2025.103034","DOIUrl":"10.1016/j.inffus.2025.103034","url":null,"abstract":"<div><div>Nowadays, the feature extraction module of the backbone network develops rapidly in the field of real-time object detection, while the multi-scale design for the neck iterates slowly. The traditional PAFPN is short in cross-scale propagation efficiency. However, many newly proposed multi-scale fusion methods, which make up for this drawback, are difficult to widely apply because of their complex fusion modules and unfriendly training. In this paper, we propose the Scarf Path Aggregation Feature Pyramid Network (SPAFPN), an advanced neck structure of multi-scale fusion for real-time object detection. SPAFPN adheres to the decentralized multi-scale fusion idea of ”Light Fusion, Heavy Decouple” while inheriting the concept of versatility and portability design. SPAFPN can promote cross-scale low-loss transfer of features and improve the performance of the model, which mainly consists of Pyramid Fusion and Multi-Concat modules. The experimental results on the MS COCO dataset show that SPAFPN-HG-N/S/M can achieve a 5.2%/3.2%/1.1% mAP improvement over YOLOv8, respectively. We also adopted some latest backbone networks to verify the scalability of SPAFPN. Specifically, SPAFPN combined with C2f and GELAN as backbone network modules both perform well. In addition, SPAFPN still maintains good performance when applied to the DETR-based real-time object detector. The code is available at <span><span>https://github.com/ztbian-bzt/SPAFPN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103034"},"PeriodicalIF":14.7,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-02-25DOI: 10.1016/j.inffus.2025.103037
Jianwen Sun , Wangzi Shi , Xiaoxuan Shen , Shengyingjie Liu , Luona Wei , Qian Wan
{"title":"Multi-objective math problem generation using large language model through an adaptive multi-level retrieval augmentation framework","authors":"Jianwen Sun , Wangzi Shi , Xiaoxuan Shen , Shengyingjie Liu , Luona Wei , Qian Wan","doi":"10.1016/j.inffus.2025.103037","DOIUrl":"10.1016/j.inffus.2025.103037","url":null,"abstract":"<div><div>Math problems are an important knowledge carrier and evaluation means in personalized teaching. Their high cost of manual compilation promotes the research of math problem generation. Many previous studies have focused on the generation of math word problems, which are difficult to meet the real teaching needs due to the single task-objective orientation and small differences in generation results. By fusing external knowledge through retrieval-augmented generation (RAG), large language model (LLM) can generate a variety of math problems, but the generated results still have limitations such as poor knowledge consistency, uncontrollability, and high computational cost. In this paper, we propose the task of multi-objective math problem generation (MMPG). This task introduces the triple objectives of generation including “question type, knowledge point and difficulty” in respond to teaching needs in real scene. To the best of our knowledge, this is the first study considering multiple objectives on the process of math problem generation. Based on this, we further design an adaptive multi-level retrieval augmentation framework (AMRAF) for LLM to generate multi-objective math problems. This plug-and-play framework can effectively improve the generation performance without parameter tuning of the target model due to the fine-grained information retrieval and fusion. To verify the effectiveness of the proposed framework and provide a benchmark for subsequent research, we construct an MMPG dataset containing 9,000 samples. Experimental results demonstrate the superiority and effectiveness of our framework.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103037"},"PeriodicalIF":14.7,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143508891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-02-24DOI: 10.1016/j.inffus.2025.103035
Haiyue Wang , Wensheng Zhang , Quan Wang , Xiaoke Ma
{"title":"Adaptive structural-guided multi-level representation learning with graph contrastive for incomplete multi-view clustering","authors":"Haiyue Wang , Wensheng Zhang , Quan Wang , Xiaoke Ma","doi":"10.1016/j.inffus.2025.103035","DOIUrl":"10.1016/j.inffus.2025.103035","url":null,"abstract":"<div><div>Incomplete multi-view clustering (IMC) is a pivotal task within the area of machine learning, encompassing several unresolved challenges, such as representation of objects, relations of various views, discriminative of features, and data restoration. To address these challenges, we propose a novel <u>A</u>daptive <u>S</u>tructural-guided <u>M</u>ulti-level representation <u>L</u>earning with <u>G</u>raph <u>C</u>ontrastive algorithm for IMC (ASMLGC), where feature learning, data restoration and clustering are simultaneously integrated. Concretely, ASMLGC learns multi-level representations of objects by extending auto-encoders, which explicitly captures hierarchical structure of multi-view data, providing a better and more comprehensive strategy to characterize data from multiple resolutions. And, missing views are recovered by leveraging multi-level representations, where global and local information are fully exploited to enhance the accuracy and robustness of imputation. Furthermore, ASMLGC proposes graph contrastive learning to maximize intra-cluster consistency, where information derived from various resolutions, such as feature level and meta-structure level, is explored to construct positive and negative samples, thereby improving discriminative of features. The extensive experimental results confirm that ASMLGC outperforms baselines on benchmarking datasets, particularly for these datasets with complicated hierarchical structure. The proposed algorithm can be applied to bioinformatics, medical image analysis, and social network analysis.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"119 ","pages":"Article 103035"},"PeriodicalIF":14.7,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information FusionPub Date : 2025-02-24DOI: 10.1016/j.inffus.2025.103011
Zichang Tan , Guiwei Zhang , Zihui Tan , Prayag Tiwari , Yi Wang , Yang Yang
{"title":"CAM2Former: Fusion of Camera-specific Class Activation Map matters for occluded person re-identification","authors":"Zichang Tan , Guiwei Zhang , Zihui Tan , Prayag Tiwari , Yi Wang , Yang Yang","doi":"10.1016/j.inffus.2025.103011","DOIUrl":"10.1016/j.inffus.2025.103011","url":null,"abstract":"<div><div>Occluded person re-identification (ReID) is challenging since persons are frequently perturbed by various occlusions. Existing mainstream schemes prioritize the alignment of fine-grained body parts by error-prone computation-intensive information, which might come with high estimation error and much computation. To this end, we present the <strong>CAM</strong>emra-specific <strong>C</strong>lass <strong>A</strong>ctivation <strong>M</strong>ap (<span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>), designed to identify critical foreground components with interpretability and computational efficiency. Expanding on this foundation, we launched the <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>-guided Vision Transformer, which is termed <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former, with three core designs. <strong>First</strong>, we develop Fusion of CAMmera-specific Class Activation Map, termed <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Fusion, which consists of positive and negative <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span> that operate in synergy to capture visual patterns representative of the discriminative foreground components. <strong>Second</strong>, to enhance the representation ability of pivotal foreground components, we introduce a <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Fusion-attention mechanism. This strategy imposes sparse attention weights on identity-agnostic interference discerned by positive and negative <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>. <strong>Third</strong>, since the enhancement of foreground representations in <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former depends on camera-specific classifiers, which are not available during inference, we introduce a consistent learning scheme. This design ensures that representations derived from vanilla ViT align consistently with those obtained via <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span>Former. This facilitates the extraction of discriminative foreground representations, circumventing <span><math><msup><mrow><mtext>CAM</mtext></mrow><mrow><mn>2</mn></mrow></msup></math></span> dependencies during inference without additional complexity. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on two occluded datasets (Occluded-Duke and Occluded-REID) and two holistic datasets (Market1501 and MSMT17), achieving an R1 of 74.4% and a mAP of 64.8% on Occluded-Dukes.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"120 ","pages":"Article 103011"},"PeriodicalIF":14.7,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143619908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}