Pattern Recognition最新文献

筛选
英文 中文
4DStyleGaussian: Generalizable 4D style transfer with Gaussian splatting 4DStyleGaussian:广义的四维风格转移与高斯飞溅
IF 7.6 1区 计算机科学
Pattern Recognition Pub Date : 2025-09-12 DOI: 10.1016/j.patcog.2025.112422
Wanlin Liang , Hongbin Xu , Weitao Chen , Feng Xiao , Wenxiong Kang
{"title":"4DStyleGaussian: Generalizable 4D style transfer with Gaussian splatting","authors":"Wanlin Liang ,&nbsp;Hongbin Xu ,&nbsp;Weitao Chen ,&nbsp;Feng Xiao ,&nbsp;Wenxiong Kang","doi":"10.1016/j.patcog.2025.112422","DOIUrl":"10.1016/j.patcog.2025.112422","url":null,"abstract":"<div><div>3D neural style transfer has gained significant attention for its potential to provide user-friendly stylization with 3D spatial consistency. However, existing 3D style transfer methods often struggle with inference efficiency, generalization, and maintaining temporal consistency when handling dynamic scenes. In this paper, we introduce 4DStyleGaussian, a novel 4D style transfer framework designed to achieve real-time stylization of arbitrary style references while maintaining reasonable content affinity, multi-view consistency, and temporal coherence. Our approach leverages an embedded 4D Gaussian Splatting technique, which is trained utilizing a reversible neural network for reducing content loss and artifacts in the feature distillation process. With the pre-trained 4D embedded Gaussians for efficient and view-consistent rendering, we predict a 4D style transformation matrix that facilitates spatially and temporally consistent style transfer. Experiments demonstrate that our method can achieve high-quality and generalizable stylization for 4D scenarios with enhanced efficiency and spatial-temporal consistency, with 7.1 % lower LPIPS and 2.5× faster inference compared to existing methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112422"},"PeriodicalIF":7.6,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145109809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retinex-guided generative diffusion prior for low-light image enhancement 低光图像增强的视黄醇引导生成扩散先验
IF 7.6 1区 计算机科学
Pattern Recognition Pub Date : 2025-09-11 DOI: 10.1016/j.patcog.2025.112421
Zunjin Zhao, Daming Shi
{"title":"Retinex-guided generative diffusion prior for low-light image enhancement","authors":"Zunjin Zhao,&nbsp;Daming Shi","doi":"10.1016/j.patcog.2025.112421","DOIUrl":"10.1016/j.patcog.2025.112421","url":null,"abstract":"<div><div>Existing Retinex-based training-free low-light image enhancement (LLIE) methods often rely on complex architectures or lack support for text-controlled personalization. In this paper, we propose RetinexGDP, a training-free and text-controllable LLIE framework that uniquely integrates Retinex-based image modeling with generative diffusion priors. First, we introduce a simplified Retinex decomposition by embedding weighted total variation optimization into a single Gaussian convolutional layer, enabling robust illumination estimation without the need for training. Next, we guide the diffusion denoising process using the estimated reflectance map, employing patch-wise inversion and reflectance-conditioned sampling to effectively suppress noise while preserving structural details. Finally, unlike previous diffusion-based LLIE methods that perform only monotonous global brightness enhancement, we incorporate text guidance into the sampling process, enabling controllable enhancement that aligns with user-specific stylistic preferences. RetinexGDP thus provides a modular, interpretable, and text-controllable solution for low-light image enhancement. Experimental results show that RetinexGDP achieves state-of-the-art performance in terms of NIQMC and CPCQI metrics across seven real-world datasets. Code will be available at: <span><span>https://github.com/zhaozunjin/PLIE</span><svg><path></path></svg></span></div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112421"},"PeriodicalIF":7.6,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145109808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural-prior guided bi-generative network for image inpainting 基于结构先验引导的双生成网络图像绘制
IF 7.6 1区 计算机科学
Pattern Recognition Pub Date : 2025-09-10 DOI: 10.1016/j.patcog.2025.112432
Jiajun Zhang , Jizhao Liu , Huaikun Zhang , Jibao Zhang , Jing Lian
{"title":"Structural-prior guided bi-generative network for image inpainting","authors":"Jiajun Zhang ,&nbsp;Jizhao Liu ,&nbsp;Huaikun Zhang ,&nbsp;Jibao Zhang ,&nbsp;Jing Lian","doi":"10.1016/j.patcog.2025.112432","DOIUrl":"10.1016/j.patcog.2025.112432","url":null,"abstract":"<div><div>Image inpainting is a great challenge when reconstructed with realistic textures and required to enhance the consistency of semantic structures in large-scale missing regions. However, popular structural prior guidance methods primarily rely on the reconstruction of structural features. Due to the Markovian property inherent in purely feedforward architectures, noise undergoes persistent accumulation and propagation in early network layers. Without intermediate feedback mechanisms, minor artifacts in shallow layers would be nonlinearly amplified through successive convolution operations and cannot be timely corrected, thereby hindering the extraction of valid structural information. To this end, we presents a bi-generative network (Bi-GNet) guided by specific semantic structures, including an auxiliary network <span><math><msub><mi>N</mi><mtext>s</mtext></msub></math></span> and an inpainting network <span><math><msub><mi>N</mi><mtext>inp</mtext></msub></math></span>. Here <span><math><msub><mi>N</mi><mtext>s</mtext></msub></math></span> provides the structural prior information to <span><math><msub><mi>N</mi><mtext>inp</mtext></msub></math></span> for reconstructing the texture details of images. Additionally, we provide the spatial coordinate attention (SCA) and the adaptive feature filtering (AFF) module to ensure structural consistency and texture plausibility in the reconstructed content. Experiments demonstrate that Bi-GNet significantly outperforms other state-of-the-art approaches on three datasets and achieves good inpainting results on the Mogao Grottoes mural dataset.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112432"},"PeriodicalIF":7.6,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient semi-masking for improving adversarial robustness 提高对抗鲁棒性的梯度半掩蔽
IF 7.6 1区 计算机科学
Pattern Recognition Pub Date : 2025-09-10 DOI: 10.1016/j.patcog.2025.112433
Xinlei Liu , Tao Hu , Peng Yi , Baolin Li , Jichao Xie , Hailong Ma
{"title":"Gradient semi-masking for improving adversarial robustness","authors":"Xinlei Liu ,&nbsp;Tao Hu ,&nbsp;Peng Yi ,&nbsp;Baolin Li ,&nbsp;Jichao Xie ,&nbsp;Hailong Ma","doi":"10.1016/j.patcog.2025.112433","DOIUrl":"10.1016/j.patcog.2025.112433","url":null,"abstract":"<div><div>In gradient masking, certain complex signal processing and probabilistic optimization strategies exhibit favorable characteristics such as nonlinearity, irreversibility, and feature preservation, thereby providing new solutions for adversarial defense. Inspired by this, this paper proposes a plug-and-play <strong>gradient semi-masking module</strong> (<strong>GSeM</strong>) to improve the adversarial robustness of neural networks. GSeM primarily contains a feature straight-through pathway that allows for normal gradient propagation and a feature mapping pathway that interrupts gradient flow. The multi-pathway and semi-masking characteristics cause GSeM to exhibit opposing behaviors when processing data and gradients. Specifically, during data processing, GSeM compresses the state space of features while introducing white noise augmentation. However, during gradient processing, it leads to inefficient updates to certain parameters and ineffective generation of training examples. To address this shortcoming, we correct gradient propagation and introduce gradient-corrected adversarial training. Extensive experiments demonstrate that GSeM differs fundamentally from earlier gradient masking methods: it can genuinely enhance the adversarial defense performance of neural networks, surpassing previous state-of-the-art approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112433"},"PeriodicalIF":7.6,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive multi-view consistency clustering via structure-enhanced contrastive learning 基于结构增强对比学习的自适应多视图一致性聚类
IF 7.6 1区 计算机科学
Pattern Recognition Pub Date : 2025-09-09 DOI: 10.1016/j.patcog.2025.112409
Xuqian Xue , Qi Cai , Zhanwei Zhang , Yiming Lei , Hongming Shan , Junping Zhang
{"title":"Adaptive multi-view consistency clustering via structure-enhanced contrastive learning","authors":"Xuqian Xue ,&nbsp;Qi Cai ,&nbsp;Zhanwei Zhang ,&nbsp;Yiming Lei ,&nbsp;Hongming Shan ,&nbsp;Junping Zhang","doi":"10.1016/j.patcog.2025.112409","DOIUrl":"10.1016/j.patcog.2025.112409","url":null,"abstract":"<div><div>Current state-of-the-art deep multi-view clustering methods resort to contrastive learning to learn consensus representations with Cross-View Consistency (<span>CVC</span>). However, contrastive learning has inherent limitations when being applied to the multi-view clustering. On one hand, contrastive learning suffers from class collision issue, compromising the discriminability of consensus representation. On the other hand, contrastive alignment of two views of different quality could lead to representation degradation for the higher-quality view, weakening the robustness of the consensus representation. To alleviate these issues, this paper presents an Adaptive Multi-view consistency clustering method via structure-enhanced contrastive learning (<span>A</span>da<span>M</span>), which learns multi-faceted consensus representation that balances view-consistency, discriminability and robustness, forming an optimal consensus representation. Specifically, we first design a view fusion module and a structural learning module to learn view weights and structural relationships among samples, respectively, to derive the consensus representation. Second, beyond <span>CVC</span>, we propose a novel clustering framework called Adaptive Multi-View Consistency (<span>AMVC</span>), which adaptively aligns specific view representation with consensus representation based on the learned view weights. Furthermore, compared to <span>CVC</span>, we theoretically demonstrate the superiority of <span>AMVC</span> in learning robust consensus representation. Third, <span>A</span>da<span>M</span> leverages the structural relationships among samples to refine the conventional contrastive loss, further enhancing the discriminability of the consensus representation. Extensive experimental results on eight datasets demonstrate the superior performance of <span>A</span>da<span>M</span> over eight advanced multi-view clustering baselines.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112409"},"PeriodicalIF":7.6,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical knowledge enhanced medical image classification 临床知识增强医学图像分类
IF 7.6 1区 计算机科学
Pattern Recognition Pub Date : 2025-09-09 DOI: 10.1016/j.patcog.2025.112414
Zhikang Xu , Jiye Liang , Zhipeng Wei , Xiaodong Yue , Deyu Li
{"title":"Clinical knowledge enhanced medical image classification","authors":"Zhikang Xu ,&nbsp;Jiye Liang ,&nbsp;Zhipeng Wei ,&nbsp;Xiaodong Yue ,&nbsp;Deyu Li","doi":"10.1016/j.patcog.2025.112414","DOIUrl":"10.1016/j.patcog.2025.112414","url":null,"abstract":"<div><div>Due to the scarcity of data in medical field, deep learning-based medical image classification faces challenges in both accuracy and reliability. Foundation models (FMs) provide a promising enhancement strategy by extracting the text medical knowledge embeddings from FMs and use it to guide the specific classification model. However, the clinical knowledge is generally structurized, and the use of pure text as knowledge representation may not be significant enough for enhancing downstream model. Moreover, the lesion areas are generally subtle, combining FMs to downstream model in a coarse-grained manner still faces challenge in precisely attending the lesions. To tackle these challenges, we propose a novel medical image classification model that effectively embeds clinical knowledge through combining graphs and FMs. First, we represent the clinical rules as graphs, where the node describes the critical characteristics of disease. During training, we use FMs to extract the embeddings of node text description, and use graph transformer to extract global representation of graphs. By employing vision transformer to encode input images, we propose a global-local alignment module to transfer clinical knowledge where the embeddings of image branch and graph branch are aligned from image-to-graph level and patch-to-vertex level, respectively. Moreover, we propose a dynamic image patch selection method to reduce the attention of the model to irrelevant and noisy regions. Experimental results on bladder tumor classification dataset verifies that even with limited training data, the proposed method can not only achieve SOTA performance, but also accurately attend the lesion areas, thus improving the trustworthiness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112414"},"PeriodicalIF":7.6,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145158618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Preserving privacy without compromising accuracy: Machine unlearning for handwritten text recognition 在不影响准确性的情况下保护隐私:手写文本识别的机器学习
IF 7.6 1区 计算机科学
Pattern Recognition Pub Date : 2025-09-09 DOI: 10.1016/j.patcog.2025.112411
Lei Kang, Xuanshuo Fu, Lluis Gomez, Alicia Fornés, Ernest Valveny, Dimosthenis Karatzas
{"title":"Preserving privacy without compromising accuracy: Machine unlearning for handwritten text recognition","authors":"Lei Kang,&nbsp;Xuanshuo Fu,&nbsp;Lluis Gomez,&nbsp;Alicia Fornés,&nbsp;Ernest Valveny,&nbsp;Dimosthenis Karatzas","doi":"10.1016/j.patcog.2025.112411","DOIUrl":"10.1016/j.patcog.2025.112411","url":null,"abstract":"<div><div>Handwritten Text Recognition (HTR) is crucial for document digitization, but handwritten data can contain user-identifiable features, like unique writing styles, posing privacy risks. Regulations such as the “right to be forgotten” require models to remove these sensitive traces without full retraining. We introduce a practical encoder-only transformer baseline as a robust reference for future HTR research. Building on this, we propose a two-stage unlearning framework for multihead transformer HTR models. Our method combines neural pruning with machine unlearning applied to a writer classification head, ensuring sensitive information is removed while preserving the recognition head. We also present Writer-ID Confusion (WIC), a method that forces the forget set to follow a uniform distribution over writer identities, unlearning user-specific cues while maintaining text recognition performance. We compare WIC to Random Labeling, Fisher Forgetting, Amnesiac Unlearning, and DELETE within our prune-unlearn pipeline and consistently achieve better privacy and accuracy trade-offs. This is the first systematic study of machine unlearning for HTR. Using metrics such as Accuracy, Character Error Rate (CER), Word Error Rate (WER), and Membership Inference Attacks (MIA) on the IAM and CVL datasets, we demonstrate that our method achieves state-of-the-art or superior performance for effective unlearning. These experiments show that our approach effectively safeguards privacy without compromising accuracy, opening new directions for document analysis research. Our code is publicly available at <span><span>https://github.com/leitro/WIC-WriterIDConfusion-MachineUnlearning</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112411"},"PeriodicalIF":7.6,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive integration of textual context and visual embeddings for underrepresented vision classification 文本上下文和视觉嵌入的自适应集成用于未充分代表的视觉分类
IF 7.6 1区 计算机科学
Pattern Recognition Pub Date : 2025-09-08 DOI: 10.1016/j.patcog.2025.112420
Seongyeop Kim , Hyung-Il Kim , Yong Man Ro
{"title":"Adaptive integration of textual context and visual embeddings for underrepresented vision classification","authors":"Seongyeop Kim ,&nbsp;Hyung-Il Kim ,&nbsp;Yong Man Ro","doi":"10.1016/j.patcog.2025.112420","DOIUrl":"10.1016/j.patcog.2025.112420","url":null,"abstract":"<div><div>The advancement of deep learning has significantly improved image classification performance; however, handling long-tail distributions remains challenging due to the limited data available for rare classes. Existing approaches predominantly focus on visual features, often neglecting the valuable contextual information provided by textual data, which can be especially beneficial for classes with sparse visual examples. In this work, we introduce a novel method addressing this limitation by integrating textual data generated by advanced language models with visual inputs through our newly proposed Adaptive Integration Block for Vision-Text Synergy (AIB-VTS). Specifically designed for Vision Transformer architectures, AIB-VTS adaptively balances visual and textual information during inference, effectively utilizing textual descriptions generated from large language models. Extensive experiments on benchmark datasets demonstrate substantial performance improvements across all class groups, particularly in underrepresented (tail) classes. These results confirm the effectiveness of our approach in leveraging textual context to mitigate data scarcity issues and enhance model robustness.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112420"},"PeriodicalIF":7.6,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145049757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated automatic latent variable selection in multi-output Gaussian processes 多输出高斯过程的联邦自动潜变量选择
IF 7.6 1区 计算机科学
Pattern Recognition Pub Date : 2025-09-08 DOI: 10.1016/j.patcog.2025.112410
Jingyi Gao , Seokhyun Chung
{"title":"Federated automatic latent variable selection in multi-output Gaussian processes","authors":"Jingyi Gao ,&nbsp;Seokhyun Chung","doi":"10.1016/j.patcog.2025.112410","DOIUrl":"10.1016/j.patcog.2025.112410","url":null,"abstract":"<div><div>This paper explores a federated learning approach that automatically selects the number of latent processes in multi-output Gaussian processes (MGPs). The MGP has seen great success as a transfer learning tool when data is generated from multiple sources/units/entities. A common approach in MGPs to transfer knowledge across units involves gathering all data from each unit to a central server and extracting common independent latent processes to express each unit as a linear combination of the shared latent patterns. However, this approach poses key challenges in (i) determining the adequate number of latent processes and (ii) relying on centralized learning which leads to potential privacy risks and significant computational burdens on the central server. To address these issues, we propose a hierarchical model that places spike-and-slab priors on the coefficients of each latent process. These priors help automatically select only needed latent processes by shrinking the coefficients of unnecessary ones to zero. To estimate the model while avoiding the drawbacks of centralized learning, we propose a variational inference-based approach, that formulates model inference as an optimization problem compatible with federated settings. We then design a federated learning algorithm that allows units to jointly select and infer the common latent processes without sharing their data. We also discuss an efficient learning approach for a new unit within our proposed federated framework. Simulation and case studies on Li-ion battery degradation and air temperature data demonstrate the advantageous features of our proposed approach.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112410"},"PeriodicalIF":7.6,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PCFFusion: Progressive cross-modal feature fusion network for infrared and visible images PCFFusion:红外和可见光图像的渐进式跨模态特征融合网络
IF 7.6 1区 计算机科学
Pattern Recognition Pub Date : 2025-09-08 DOI: 10.1016/j.patcog.2025.112419
Shuying Huang , Kai Zhang , Yong Yang , Weiguo Wan
{"title":"PCFFusion: Progressive cross-modal feature fusion network for infrared and visible images","authors":"Shuying Huang ,&nbsp;Kai Zhang ,&nbsp;Yong Yang ,&nbsp;Weiguo Wan","doi":"10.1016/j.patcog.2025.112419","DOIUrl":"10.1016/j.patcog.2025.112419","url":null,"abstract":"<div><div>Infrared and visible image fusion (IVIF) aims to fuse thermal target information in infrared images and spatial texture information in visible images, improving the observability and comprehensibility of the fused images. Currently, most IVIF methods suffer from the loss of salient target information and texture details in fused images. To alleviate this problem, a progressive cross-modal feature fusion network (PCFFusion) for IVIF is proposed, which comprises two stages: feature extraction and feature fusion. In the feature extraction stage, to enhance the network’s feature representation capability, a feature decomposition module (FDM) is constructed to extract two modal features of different scales by defining a feature decomposition operation (FDO). In addition, by establishing correlations between the high- frequency and low-frequency components of two modal features, a cross-modal feature enhancement module (CMFEM) is built to realize correction and enhancement of the two features at each scale. The feature fusion stage achieves the fusion of two modal features at each scale and the supplementation of adjacent scale features by constructing three cross-domain fusion module (CDFMs). To constrain the fused results preserve more salient targets and richer texture details, a dual-feature fidelity loss function is defined by constructing a salient weight map to balance the two loss terms. Extensive experiments demonstrate that fusion results of the proposed method highlight prominent targets from infrared images while retaining rich background details from visible images, and the performance of PCFFusion is superior to some advanced methods. Specifically, compared to the optimal results obtained by other comparison methods, the proposed network achieves an average increase of 30.35 % and 10.9 % in metrics Mutual Information (MI) and Standard deviation (SD) on the TNO dataset, respectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112419"},"PeriodicalIF":7.6,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145057367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信