{"title":"Hierarchical gradient modulation for multi-resolution image registration","authors":"Luhang Shen , Jinfang Ouyang , Zizhao Guo , Na Ying , Huahua Chen , Chunsheng Guo","doi":"10.1016/j.patcog.2025.112525","DOIUrl":"10.1016/j.patcog.2025.112525","url":null,"abstract":"<div><div>In image registration, traditional methods often require manual supervision or the use of paired image data to optimize transformations, which can be both labor-intensive and limited in generalization. Unsupervised methods, by contrast, aim to automatically learn the optimal alignment between images without relying on labeled data, but they often struggle with balancing the complex trade-off between regularization and similarity loss, leading to issues with model tuning and weak generalization across diverse datasets. In this paper, a novel hierarchical gradient modulation strategy is proposed for multi-resolution image registration. This method introduces a compatibility criterion based on the relationship between the gradients of similarity loss and regularization loss. It evaluates the compatibility between similarity and regularization gradients, integrates them through intuitive strategies that align mutually reinforcing gradients, project conflicting gradients orthogonally to avoid interference, and balance gradients of equal importance through averaging. It prioritizes global deformation with stronger regularization at low resolution and focuses on fine details with reduced regularization at high resolution. Experimental results on common medical datasets, forward-looking sonar datasets, and fabric defect detection datasets demonstrate that the proposed method achieves superior registration performance compared to baseline methods and state-of-the-art hyperparameter-related research without incurring additional computational costs, achieving optimal loss balance in a multi-resolution structure.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112525"},"PeriodicalIF":7.6,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"WFDENet: Wavelet-based frequency decomposition and enhancement network for diabetic retinopathy lesion segmentation","authors":"Xuan Li, Ding Ma, Xiangqian Wu","doi":"10.1016/j.patcog.2025.112492","DOIUrl":"10.1016/j.patcog.2025.112492","url":null,"abstract":"<div><div>The acquisition of precise semantic and detailed information is indispensable for high-accuracy diabetic retinopathy lesion segmentation (DRLS). To achieve this, noticing that high- and low-level encoder features respectively contain rich semantics and details, most existing DRLS methods focus on the design of delicate multi-level feature refinement and fusion manners. However, they ignore the exploration of intrinsic low- and high-frequency information of multi-level features, which can also describe the semantics and details. To fill this gap, we propose a Wavelet-based Frequency Decomposition and Enhancement Network (WFDENet), which simultaneously refines semantic and detailed representations by enhancing the low- and high-frequency components of the multi-level encoder features. Specifically, the low- and high-frequency components, which are acquired via discrete wavelet transform (DWT), are boosted by a low-frequency booster (LFB) and a high-frequency booster (HFB), respectively. High-frequency components contain abundant details but also more noise. To suppress the noise and strengthen critical features, in HFB, we devise a complex convolutional frequency attention module (CCFAM), which utilizes complex convolutions to generate dynamic complex-valued channel and spatial attention to improve the Fourier spectrum of high-frequency components. Moreover, considering the importance of multi-scale information, we aggregate the multi-scale frequency features to enrich the frequency components in both LFB and HFB. Experimental results on IDRiD and DDR datasets show that our WFDENet outperforms state-of-the-art methods. The source code is available at <span><span>https://github.com/xuanli01/WFDENet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112492"},"PeriodicalIF":7.6,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chuan-Lin Gan, Rui-Sheng Jia, Hong-Mei Sun, Yuan-Chao Song
{"title":"Multi-modal mamba framework for RGB-T crowd counting with linear complexity","authors":"Chuan-Lin Gan, Rui-Sheng Jia, Hong-Mei Sun, Yuan-Chao Song","doi":"10.1016/j.patcog.2025.112522","DOIUrl":"10.1016/j.patcog.2025.112522","url":null,"abstract":"<div><div>Existing RGB-T crowd counting methods enhance counting accuracy by integrating RGB images with thermal imaging features. However, attention-based fusion methods have a computational complexity of <span><math><mrow><mi>O</mi><mo>(</mo><msup><mi>N</mi><mn>2</mn></msup><mo>)</mo></mrow></math></span>, which significantly increases computational costs. Moreover, current approaches fail to sufficiently retain the detailed information of the original modalities during feature fusion, leading to the loss of critical information. To address these issues, this paper proposes a cross-modal fusion network based on Mamba, named VMMNet. Specifically, a Dynamic State Space (DSS) block is designed using the selective scan mechanism, reducing the computational complexity of attention mechanisms from <span><math><mrow><mi>O</mi><mo>(</mo><msup><mi>N</mi><mn>2</mn></msup><mo>)</mo></mrow></math></span> to linear, thereby significantly improving network efficiency and inference speed. Furthermore, to tackle the issue of information loss during multimodal feature fusion, two innovative modules, the Cross-Mamba Enhancement Block (CMEB) and the Merge-Mamba Fusion Block (MMFB), are introduced. The CMEB enhances inter-modal information interaction through a cross-selective scan mechanism, while the MMFB further integrates the features output by CMEB to ensure information integrity. Finally, a Channel Aware Mamba Decoder (CMD) is designed to enhance the network’s modeling capability in the channel dimension. On existing RGB-T crowd counting datasets, VMMNet reduces FLOPs by 94.3 % compared to the state-of-the-art methods and achieves performance improvements of 18.7 % and 23.3 % in GAME(0) and RMSE, respectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112522"},"PeriodicalIF":7.6,"publicationDate":"2025-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yusen Zhang, Min Li, Xianjie Zhang, Song Yan, Yujie He
{"title":"Using dynamic knowledge for kernel modulation: Towards image generation via one-shot multi-domain adaptation","authors":"Yusen Zhang, Min Li, Xianjie Zhang, Song Yan, Yujie He","doi":"10.1016/j.patcog.2025.112489","DOIUrl":"10.1016/j.patcog.2025.112489","url":null,"abstract":"<div><div>One-shot domain adaptation across multiple image domains aims to learn complex image distributions using just one training image from each target domain. Existing methods often select, preserve and transfer prior knowledge from the source domain pre-trained model to learn the target model with distinct styles. However, they always neglect the cross-domain shared knowledge and fail to consider the adaptation relationships in selecting source knowledge when facing varying styles in multi-target domain scenarios, casting doubt on their suitability. In this paper, we propose a novel one-shot multi-target image domain adaptation model based on kernel modulation. By leveraging the similarity information inherent in cross-domain knowledge to guide the correlation of feature learning, our method allows for precise control over the transfer of task-relevant knowledge while minimizing irrelevant information. Furthermore, we propose a novel cross-domain contrastive loss that incorporates dual constraints of structure and style. By effectively mining strong negative samples from cross-domain knowledge, it aims to maximize the structural feature from the source domain and accurately reflect the unique attributes of various style target domains. Extensive experiments on several datasets show that our method offers significant advantages in concurrently establishing multiple image domain mapping relationships. Moreover, it can effectively explore the potential for knowledge transfer in cross-domain feature learning, thereby generating higher-quality domain-adapted images.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112489"},"PeriodicalIF":7.6,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Longfei Ren , Degang Wang , Lianru Gao , Minghua Wang , Min Huang , Hongsheng Zhang
{"title":"HADDNLP: Hyperspectral anomaly detection via double nonlocal priors","authors":"Longfei Ren , Degang Wang , Lianru Gao , Minghua Wang , Min Huang , Hongsheng Zhang","doi":"10.1016/j.patcog.2025.112535","DOIUrl":"10.1016/j.patcog.2025.112535","url":null,"abstract":"<div><div>Hyperspectral anomaly detection (HAD) is a promising approach that acts as an unsupervised strategy by distinguishing anomalies from the background. Low-rank representation (LRR) based methods that exploit global correlations at the image level are effective for HAD but often fail to capture long-range correlations, resulting in the loss of important structural details. To address the limitation, we develop a novel HAD via double nonlocal priors (HADDNLP) framework that preserves critical background structure. The proposed HADDNLP method first adopts the patch-wise nonlocal low-rank tensor (NLRT) modeling to explore global correlation along spectrum (GCS) and self-similarity (SS) across distant regions in hyperspectral images (HSIs), thereby preserving the structural and contextual details of the background. Then, the nonlocal means (NLM) prior is integrated to maintain spatial distribution within the HSIs, further enhancing the model’s ability to distinguish anomalies from the background. We optimize the model with an alternating minimization (AM) algorithm for NLRT estimation and an alternating direction method of multipliers (ADMM) for joint background reconstruction and anomaly detection. Experimental results on the real satellite and aerial hyperspectral datasets demonstrate that our proposed approach outperforms state-of-the-art methods in the HAD tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112535"},"PeriodicalIF":7.6,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhibin Wan , Zhiqiang Gao , Mingjie Sun , Yang Yang , Cao Min , Hongliang He , Guohong Fu
{"title":"Rethinking hard training sample generation for medical image segmentation","authors":"Zhibin Wan , Zhiqiang Gao , Mingjie Sun , Yang Yang , Cao Min , Hongliang He , Guohong Fu","doi":"10.1016/j.patcog.2025.112533","DOIUrl":"10.1016/j.patcog.2025.112533","url":null,"abstract":"<div><div>This paper tackles the task of synthetic data generation for downstream segmentation tasks, especially in data-scarce fields like medical diagnostics. Previous methods address the challenge of similar synthetic samples leading to model saturation by leveraging the specific downstream model to guide the generation process, and dynamically adjusting sample difficulty to prevent downstream performance plateaus. However, such an approach never considers the interoperability of these synthetic samples, which may not be universally challenging due to varying feature focuses across different downstream models. Thus, we propose a strategy that uses the discrepancy between backbone-extracted features and real image prototypes to generate challenging samples, employing two loss functions: one for key-area diversity and another for overall image fidelity. This ensures key areas are challenging while the background remains stable, creating samples that are broadly applicable for downstream tasks without overfitting to specific models. Our method, leveraging the data generated by our approach for model training, achieves an average mean Intersection over Union (mIoU) of 86.84% across five polyp test datasets, surpassing the state-of-the-art (SOTA) model CTNet [1] by a significant margin of 6.14%. Code is available at <span><span>https://github.com/Bbinzz/Rethinking-Hard-Training-Sample-Generation-for-Medical-Image-Segmentation</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112533"},"PeriodicalIF":7.6,"publicationDate":"2025-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fengyu Xu , Yongxiong Xiao , Jilin Mei , Yu Hu , Qiang Fu , Hang Shi
{"title":"Domain-adaptive point cloud semantic segmentation via knowledge-augmented deep learning","authors":"Fengyu Xu , Yongxiong Xiao , Jilin Mei , Yu Hu , Qiang Fu , Hang Shi","doi":"10.1016/j.patcog.2025.112528","DOIUrl":"10.1016/j.patcog.2025.112528","url":null,"abstract":"<div><div>Owing to limitations in both the quality and quantity of training data, supervised deep learning methods for point cloud semantic segmentation often exhibit poor generalization and hallucinations, particularly when unstructured scenes are processed in autonomous driving applications. These challenges motivate us to develop a domain-adaptive point cloud semantic segmentation network based on knowledge-augmented deep learning, which can be applied to off-road scenes after being trained on urban datasets. Specifically, our method adopts the strategy of combining implicit and explicit knowledge augmentation. Moreover, network modulation based on attribution analysis is employed to integrate domain knowledge into the deep learning model, thereby mitigating the model’s susceptibility to spurious correlations. Additionally, we propose new network modulation performance metrics to evaluate the efficiency and benefit of modulation correction. For experimental validation, we use datasets with significant disparities between the urban and off-road environments in the training and testing phases, respectively. Furthermore, we release our self-collected off-road dataset PCSS-Gobi-3D, which is the first point cloud dataset of the Gobi Desert scene. Compared with other domain-adaptive methods, our method demonstrates promising cross-domain 3D semantic segmentation performance with excellent results in terms of mean Intersection over Union (mIoU) and overall accuracy.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112528"},"PeriodicalIF":7.6,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cache-aided cross-modal correlation correction for unsupervised cross-domain text-based person search","authors":"Kai Niu , Qinzi Zhao , Jiahui Chen , Yanning Zhang","doi":"10.1016/j.patcog.2025.112521","DOIUrl":"10.1016/j.patcog.2025.112521","url":null,"abstract":"<div><div>Unsupervised Cross-domain Text-Based Person Search (UC-TBPS) has to face not only the modality heterogeneity, but also the cross-domain difficulty in more practical surveillance circumstances. However, few research has focused on the cross-domain difficulty, which may severely hinder the real-world applications of TBPS. In this paper, we propose the Test-time Cache-aided Cross-modal Correlation Correction (TC<span><math><msup><mrow></mrow><mn>4</mn></msup></math></span>) method, which acts as a pioneer for especially addressing the UC-TBPS task by novel test-time re-ranking. Firstly, we conduct clustering inside the pedestrian image gallery, and construct the reward and penalty caches based on these clustering centers, to store more sentences relays for alleviating the cross-domain problem. Secondly, we calculate the reward and penalty values to refine the appropriately located image-sentence correlation positions under the guidance of these two caches, respectively. Finally, the refined image-sentence correlations are used to re-rank the original retrieval results. As a test-time re-ranking approach, our TC<span><math><msup><mrow></mrow><mn>4</mn></msup></math></span> method does not require fine-tuning in the target domain, and can obtain retrieval performance improvements with negligible additional overheads. Extensive experiments and analyses on the tasks of UC-TBPS as well as unsupervised cross-domain image-text matching can validate the effectiveness and generalization capacities of our proposed TC<span><math><msup><mrow></mrow><mn>4</mn></msup></math></span> solution.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112521"},"PeriodicalIF":7.6,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yichao Cao , Feng Yang , Xuanpeng Li , Xiaolin Meng , Xiaobo Lu
{"title":"Refining the granularity of smoke representation: SAM-powered density-aware progressive smoke segmentation framework","authors":"Yichao Cao , Feng Yang , Xuanpeng Li , Xiaolin Meng , Xiaobo Lu","doi":"10.1016/j.patcog.2025.112517","DOIUrl":"10.1016/j.patcog.2025.112517","url":null,"abstract":"<div><div>Accurately segmenting smoke remains a challenging task due to its non-rigid morphology and semi-transparent nature, which causes pixel blending between smoke and background, leading to intertwined representations. Among these issues, smoke density plays a crucial role for refine the granularity of smoke representation, yet it has been largely overlooked in previous research. In this work, we introduce the SAM-powered (Segment Anything Model [1]) <strong>DenSi</strong>ty-Aware Progressive Smoke <strong>Seg</strong>mentation method (<em><strong>DenSiSeg</strong></em>). For smoke regions, we construct a background feature prototype to refine smoke mask labels into fine-grained density information using a per-pixel metric. Following this, soft-contrastive learning and progressive evolving strategies are devised to smoothly and iteratively refine the feature distribution of smoke at different density levels. For background regions, knowledge transfer based on the vision foundation model is employed, harnessing the world knowledge within the foundation model to enhance the understanding of diverse background. Extensive experiments on several public datasets demonstrate that the proposed <em>DenSiSeg</em> method significantly outperforms state-of-the-art methods. The code will be available on <span><span>https://github.com/Caoyichao/DenSiSeg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112517"},"PeriodicalIF":7.6,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaocong Wu , Xin Feng , Xuanlong Lu , Yi Yuan , Meina Huang , Yu Shen , Jiacheng Li
{"title":"W-EICMFusion: A fusion network for infrared and visible images utilising WOA hyperparameter optimisation","authors":"Xiaocong Wu , Xin Feng , Xuanlong Lu , Yi Yuan , Meina Huang , Yu Shen , Jiacheng Li","doi":"10.1016/j.patcog.2025.112531","DOIUrl":"10.1016/j.patcog.2025.112531","url":null,"abstract":"<div><div>Currently, most innovations in image fusion methods focus on designing network architectures for source image feature extraction and formulating loss functions for fusion networks, while often neglecting the necessary adjustments and optimisations of hyperparameters within these loss functions. However, the selection of hyperparameters for the loss function of the fusion network is crucial because it determines the iterative direction of the network and significantly influences the final results. This study proposes an image fusion network with hyperparameter adaptive optimisation adjustment, termed whale optimisation algorithm-edge invertible CMamba fusion (W-EICMFusion). First, we introduce a CMamba module that effectively extracts common information features within the local and broad receptive fields. We designed an edge-extraction invertible neural network (EE-INN) module that captures edge detail information from two modalities, and a fusion layer known as a residual dense efficient channel attention network (RDENet) to enhance the extraction of complementary information. Unlike other fusion networks that depend on manual parameter tuning, this study employs the whale optimisation algorithm (WOA) to determine the optimal hyperparameters adaptively. Our experiments compared our designed fusion network with 11 recently developed advanced methods related to it. The proposed method demonstrates the best overall performance and achieves the highest comprehensive score in comparative experiments conducted on the MSRS, LLVIP, Road-Scene, and M3FD datasets. Furthermore, it attained the highest detection accuracy in subsequent tasks, such as object detection. In the final design of the optimisation algorithm generalisation experiments, the proposed approach to hyperparameter adaptive optimisation produced superior classification outcomes. The code can be found at <span><span>https://github.com/ljcuestc/W-EICMFusion.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"172 ","pages":"Article 112531"},"PeriodicalIF":7.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145266115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}