{"title":"Attention learning with counterfactual intervention based on feature fusion for fine-grained feature learning","authors":"Ning Yu , Long Chen , Xiaoyin Yi , Jiacheng Huang","doi":"10.1016/j.dsp.2025.105215","DOIUrl":null,"url":null,"abstract":"<div><div>Deep learning models can learn features from a large amount of data and usually localize the overall region of the target object accurately in visual recognition tasks. However, in fine-grained scenarios with inter-class similarities, such as brand recognition in vehicles and subspecies recognition in organisms, there is a need to capture crucial distinct features and provide reliable explanations when tracking decision behavior. Therefore, this paper builds on the idea of counterfactual intervention in causal reasoning and proposes a counterfactual intervention of attention learning to learn feature information that plays an important role in fine-grained recognition tasks. First, we use the iterative feature fusion attention module that learns different levels of features and fuses them to capture the crucial features of the target object and suppress attention to the unimportant features. Second, we perform the counterfactual intervention on the feature fusion-based attention map. The changes produced by the intervening variables serve as monitoring signals for attentional learning to enhance the feature learning that contributes positively for the predicted result. Besides, we use the contrast learning function as a constraint to avoid focusing solely on salient features, thus enabling the network model to learn richer differential features. Finally, we use GradCAM visualization to explain the process of decision-making. The experimental results show that the method in this paper learned important distinguishable features of the target object, weakens the attention to non-critical regions, and offers reliable traceability analysis in tracing back decision-making behaviors.</div></div>","PeriodicalId":51011,"journal":{"name":"Digital Signal Processing","volume":"163 ","pages":"Article 105215"},"PeriodicalIF":2.9000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1051200425002374","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning models can learn features from a large amount of data and usually localize the overall region of the target object accurately in visual recognition tasks. However, in fine-grained scenarios with inter-class similarities, such as brand recognition in vehicles and subspecies recognition in organisms, there is a need to capture crucial distinct features and provide reliable explanations when tracking decision behavior. Therefore, this paper builds on the idea of counterfactual intervention in causal reasoning and proposes a counterfactual intervention of attention learning to learn feature information that plays an important role in fine-grained recognition tasks. First, we use the iterative feature fusion attention module that learns different levels of features and fuses them to capture the crucial features of the target object and suppress attention to the unimportant features. Second, we perform the counterfactual intervention on the feature fusion-based attention map. The changes produced by the intervening variables serve as monitoring signals for attentional learning to enhance the feature learning that contributes positively for the predicted result. Besides, we use the contrast learning function as a constraint to avoid focusing solely on salient features, thus enabling the network model to learn richer differential features. Finally, we use GradCAM visualization to explain the process of decision-making. The experimental results show that the method in this paper learned important distinguishable features of the target object, weakens the attention to non-critical regions, and offers reliable traceability analysis in tracing back decision-making behaviors.
期刊介绍:
Digital Signal Processing: A Review Journal is one of the oldest and most established journals in the field of signal processing yet it aims to be the most innovative. The Journal invites top quality research articles at the frontiers of research in all aspects of signal processing. Our objective is to provide a platform for the publication of ground-breaking research in signal processing with both academic and industrial appeal.
The journal has a special emphasis on statistical signal processing methodology such as Bayesian signal processing, and encourages articles on emerging applications of signal processing such as:
• big data• machine learning• internet of things• information security• systems biology and computational biology,• financial time series analysis,• autonomous vehicles,• quantum computing,• neuromorphic engineering,• human-computer interaction and intelligent user interfaces,• environmental signal processing,• geophysical signal processing including seismic signal processing,• chemioinformatics and bioinformatics,• audio, visual and performance arts,• disaster management and prevention,• renewable energy,