{"title":"DDF2Pol: A Dual-Domain Feature Fusion Network for PolSAR Image Classification","authors":"Mohammed Q. Alkhatib","doi":"10.1016/j.patrec.2025.07.015","DOIUrl":"10.1016/j.patrec.2025.07.015","url":null,"abstract":"<div><div>This paper presents DDF2Pol, a lightweight dual-domain convolutional neural network for PolSAR image classification. The proposed architecture integrates two parallel feature extraction streams — one real-valued and one complex-valued — designed to capture complementary spatial and polarimetric information from PolSAR data. To further refine the extracted features, a depth-wise convolution layer is employed for spatial enhancement, followed by a coordinate attention mechanism to focus on the most informative regions. Experimental evaluations conducted on two benchmark datasets, Flevoland and San Francisco, demonstrate that DDF2Pol achieves superior classification performance while maintaining low model complexity. Specifically, it attains an Overall Accuracy (OA) of 98.16% on the Flevoland dataset and 96.12% on the San Francisco dataset, outperforming several state-of-the-art real- and complex-valued models. With only 91,371 parameters, DDF2Pol offers a practical and efficient solution for accurate PolSAR image analysis, even when training data is limited. The source code is publicly available at <span><span>https://github.com/mqalkhatib/DDF2Pol</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 110-116"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144723664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Stage-wise Fusion Transformer for light field saliency detection","authors":"Wenhui Jiang , Qi Shu , Hongwei Cheng , Yuming Fang , Yifan Zuo , Xiaowei Zhao","doi":"10.1016/j.patrec.2025.07.005","DOIUrl":"10.1016/j.patrec.2025.07.005","url":null,"abstract":"<div><div>Light field salient object detection (SOD) has attracted tremendous research efforts recently. As the light field data contains multiple images with different characteristics, effectively integrating the valuable information from these images remains under-explored. Recent efforts focus on aggregating the complementary information from all-in-focus (AiF) and focal stack images (FS) late in the decoding stage. In this paper, we explore how learning the AiF and FS image encoders jointly can strengthen light field SOD. Towards this goal, we propose a Stage-wise Fusion Transformer (SF-Transformer) to aggregate the rich information from AiF image and FS images at different levels. Specifically, we present a Focal Stack Transformer (FST) for focal stacks encoding, which makes full use of the spatial-stack correlations for performant FS representation. We further introduce a Stage-wise Deep Fusion (SDF) which refines both AiF and FS image representation by capturing the multi-modal feature interactions in each encoding stage, thus effectively exploring the advantages of AiF and FS characteristics. We conduct comprehensive experiments on DUT-LFSD, HFUT-LFSD, and LFSD. The experimental results validate the effectiveness of the proposed method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 117-123"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144738449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiquan Shan , Junxiao Wang , Lifeng Zhao , Liang Cai , Hongyuan Zhang , Ioannis Liritzis
{"title":"AnchorFormer: Differentiable anchor attention for efficient vision transformer","authors":"Jiquan Shan , Junxiao Wang , Lifeng Zhao , Liang Cai , Hongyuan Zhang , Ioannis Liritzis","doi":"10.1016/j.patrec.2025.07.016","DOIUrl":"10.1016/j.patrec.2025.07.016","url":null,"abstract":"<div><div>Recently, vision transformers (ViTs) have achieved excellent performance on vision tasks by measuring the global self-attention among the image patches. Given <span><math><mi>n</mi></math></span> patches, they will have quadratic complexity such as <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> and the time cost is high when splitting the input image with a small granularity. Meanwhile, the pivotal information is often randomly gathered in a few regions of an input image, some tokens may not be helpful for the downstream tasks. To handle this problem, we introduce an anchor-based efficient vision transformer (<strong>AnchorFormer</strong>), which employs the anchor tokens to learn the pivotal information and accelerate the inference. Firstly, by estimating the bipartite attention between the anchors and tokens, the complexity will be reduced from <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> to <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>m</mi><mi>n</mi><mo>)</mo></mrow></mrow></math></span>, where <span><math><mi>m</mi></math></span> is an anchor number and <span><math><mrow><mi>m</mi><mo><</mo><mi>n</mi></mrow></math></span>. Notably, by representing the anchors with the neurons in a neural layer, we can differentiably learn these anchors and approximate global self-attention through the Markov process. It avoids the burden caused by non-differentiable operations and further speeds up the approximate attention. Moreover, we extend the proposed model to three downstream tasks including classification, detection, and segmentation. Extensive experiments show the effectiveness of AnchorFormer, e.g., achieving up to a <em><strong>9.0%</strong></em> higher accuracy or <em><strong>46.7%</strong></em> FLOPs reduction on ImageNet classification, <em><strong>81.3%</strong></em> higher mAP on COCO detection under comparable FLOPs, as compared to the current baselines.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 124-131"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144738450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved fine-tuning of mask-aware transformer for personalized face inpainting with semantic-aware regularization","authors":"Yuan Zeng , Yijing Sun , Yi Gong","doi":"10.1016/j.patrec.2025.07.009","DOIUrl":"10.1016/j.patrec.2025.07.009","url":null,"abstract":"<div><div>Recent advances in generative models have led to significant improvements in the challenging task of high-fidelity image inpainting. How to effectively guide or control these powerful models to perform personalized tasks becomes an important open problem. In this letter, we introduce a semantic-aware fine-tuning method for adapting a pre-trained image inpainting model, mask-aware transformer (MAT), to personalized face inpainting. Unlike existing methods, which tune a personalized generative prior with multiple reference images, our method can recover the key facial features of the individual with only few input references. To improve the fine-tuning stability in a setting with few reference images, we propose a multiscale semantic-aware regularization to encourage the generated key facial components to match those in the reference. Specifically, we generate a mask to extract the key facial components as prior knowledge and impose a semantic-based regularization on these regions at multiple scales, with which the fidelity and identity preservation of facial components are significantly promoted. Extensive experiments demonstrate that our method can generate high-fidelity personalized face inpainting results using only three reference images, which is much fewer than personalized inpainting baselines.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 95-101"},"PeriodicalIF":3.3,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144721453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimized Deep Isolation Forest","authors":"Łukasz Gałka","doi":"10.1016/j.patrec.2025.07.014","DOIUrl":"10.1016/j.patrec.2025.07.014","url":null,"abstract":"<div><div>Anomaly detection and the identification of elements that do not fit the data characteristics are increasingly used in information systems, both for data cleaning and for finding unusual elements. Unsupervised anomaly detection methods are particularly useful in this context. This paper introduces the Optimized Deep Isolation Forest (ODIF) as an optimized version of the Deep Isolation Forest (DIF) algorithm. The training of DIF is subjected to an optimization of the operations performed, which leads to a reduction of the computational and memory complexity. In a series of experiments, both DIF and ODIF are implemented, and their effectiveness is evaluated using Area Under the Precision-Recall Curve (PR AUC). The proposed method demonstrates significantly better detection performance compared to the baseline Isolation Forest and competitive techniques. Additionally, the execution times of the training phase are measured for both the CPU and GPU stages, as well as memory usage, including RAM and VRAM. The results unequivocally indicate a much faster execution of the ODIF algorithm compared to DIF, with average CPU stage and GPU stage times being over one and a half times and nearly 150 times shorter, respectively. Similarly, memory usage is significantly reduced for ODIF in comparison to DIF, with RAM consumption lowered by approximately 18% and VRAM by over 55%.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 88-94"},"PeriodicalIF":3.9,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144712915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiujing Xu , Peiming Guo , Fei Li , Meishan Zhang , Donghong Ji
{"title":"Improving LLM-based opinion expression identification with dependency syntax","authors":"Qiujing Xu , Peiming Guo , Fei Li , Meishan Zhang , Donghong Ji","doi":"10.1016/j.patrec.2025.07.012","DOIUrl":"10.1016/j.patrec.2025.07.012","url":null,"abstract":"<div><div>Opinion expression identification (OEI), a crucial task in fine-grained opinion mining, has received long-term attention for several decades. Recently, large language models (LLMs) have demonstrated substantial potential on the task. However, structural-aware syntax features, which have proven highly effective for encoder-based OEI models, remain challenging to be explored under the LLM paradigm. In this work, we introduce a novel approach that successfully enhances LLM-based OEI with the aid of dependency syntax. We start with a well-formed prompt learning framework for OEI, and then enrich the prompting text with syntax information from an off-the-shelf dependency parser. To mitigate the negative impact of irrelevant dependency structures, we employ a BERT-based CRF model as a retriever to select only salient dependencies. Experiments on three benchmark datasets covering English, Chinese and Portuguese indicate that our method is highly effective, resulting in significant improvements on all datasets. We also provide detailed analysis to understand our method in-depth.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 81-87"},"PeriodicalIF":3.9,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jongseong Bae , Susang Kim , Minsu Cho , Ha Young Kim
{"title":"MVFormer: Diversifying feature normalization and token mixing for efficient vision transformers","authors":"Jongseong Bae , Susang Kim , Minsu Cho , Ha Young Kim","doi":"10.1016/j.patrec.2025.07.019","DOIUrl":"10.1016/j.patrec.2025.07.019","url":null,"abstract":"<div><div>Active research is currently underway to enhance the efficiency of vision transformers (ViTs). Most studies have focused solely on token mixers, overlooking the potential relationship with normalization. To boost diverse feature learning, we propose two components: multi-view normalization (MVN) and multi-view token mixer (MVTM). The MVN integrates three differently normalized features via batch, layer, and instance normalization using a learnable weighted sum, expected to offer diverse feature distribution to the token mixer, resulting in beneficial synergy. The MVTM is a convolution-based multiscale token mixer with local, intermediate, and global filters which incorporates stage specificity by configuring various receptive fields at each stage, efficiently capturing ranges of visual patterns. By adopting both in the MetaFormer block, we propose a novel ViT, multi-vision transformer (MVFormer). Our MVFormer outperforms state-of-the-art convolution-based ViTs on image classification with the same or lower parameters and MACs. Particularly, MVFormer variants, MVFormer-T, S, and B achieve 83.4 %, 84.3 %, and 84.6 % top-1 accuracy, respectively, on ImageNet-1 K benchmark.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 72-80"},"PeriodicalIF":3.9,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing change detection in multi-date images using a Multi-temporal Siamese Neural Network","authors":"Farah Chouikhi , Ali Ben Abbes , Imed Riadh Farah","doi":"10.1016/j.patrec.2025.07.010","DOIUrl":"10.1016/j.patrec.2025.07.010","url":null,"abstract":"<div><div>Satellite imagery’s temporal and spatial variability presents challenges for accurate change detection. To address this challenge, we propose a Multi-temporal Siamese Variational Auto-Encoder (MSVAE). MSVAE obtains latent representations by concatenating extracted features from multi-date images while sharing weights. This architecture combines the advantages of Siamese networks and variational auto-encoders (VAE), ensuring both spatial and temporal consistency of the extracted features for desertification detection. We conducted experiments in the arid regions of Tunisia using Landsat imagery and supervised classification techniques for desertification detection. The results demonstrated a classification accuracy of 98.46% for the proposed MSVAE, outperforming other models, such as the Multi-temporal Siamese Convolutional Neural Network (MSCNN) and the Multi-temporal Siamese Recurrent Neural Network (MSRNN).</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 65-71"},"PeriodicalIF":3.9,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"O2Flow: Object-aware optical flow estimation","authors":"Ziyi Liu, Jiawei Wang, Haixu Bi, Hongmin Liu","doi":"10.1016/j.patrec.2025.06.023","DOIUrl":"10.1016/j.patrec.2025.06.023","url":null,"abstract":"<div><div>Optical flow estimation is a fundamental task in computer vision that aims to obtain dense pixel motion of adjacent frames. It is widely considered a low-level vision task. Existing methods mainly focus on capturing local or global matching clues for pixel motion modeling, while neglecting object-level information. However, human perception of motion is closely linked to high-level object understanding. To introduce the object awareness into the optical flow estimation pipeline, we propose an Object-aware Optical Flow Estimation framework (O<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Flow) comprising two branches: the Object Awareness (OA) Branch and the Flow Prediction (FP) Branch. Specifically, the FP-branch serves the basic optical flow prediction function, while the OA-branch is designed to capture the object-level information, guided by an auxiliary moving object prediction task. Extensive experimental results demonstrate that our method significantly enhances optical flow estimation performance in these challenging regions. Compared with the other two-view methods, O<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Flow achieves state-of-the-art results on the Sintel and KITTI-2015 benchmarks.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 58-64"},"PeriodicalIF":3.9,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aman Verma , Gaurav Jaswal , Seshan Srirangarajan , Sumantra Dutta Roy
{"title":"Biometric characteristics of hand gestures through joint decomposition of cross-subject and cross-session biases","authors":"Aman Verma , Gaurav Jaswal , Seshan Srirangarajan , Sumantra Dutta Roy","doi":"10.1016/j.patrec.2025.07.007","DOIUrl":"10.1016/j.patrec.2025.07.007","url":null,"abstract":"<div><div>Hand gestures are natural in human–computer interfaces for their use in gesture-based control, as well as personalized devices with privacy preservation and device personalization. However, some hand gestures bear biometric traits, whereas some others generalize well across subjects. The ‘entanglement’ between these concepts is in gestures with tight intra-subject clusters (in a feature space), and low inter-subject separation. This puts forth a key requirement: <em>quantifying the biometric ‘goodness’ of a gesture so as to segregate between the gestures prudent for authentication and recognition-level generalization</em>. We propose a biometric characterization framework that estimates how uniquely and with how much variability, any unseen/seen subject would perform a particular gesture. We leverage a theoretical understanding of personality and stability biases present in cross-user (<em>CU</em>) and cross-session (<em>CS</em>) gesture recognition experiments. Structuring upon the biases that influence performance in these experiments, we derive mathematical relationships that quantify the biometric goodness of a gesture. In order to disentangle bias for identity understanding, we introduce two strategies based upon ‘bias mitigation’ and ‘bias intensification’. We empirically validate the proposed framework through experiments on three datasets. The proposed framework is generic and does not require user identity information. Additionally, this can operate over any existing hand-gesture recognition pipelines.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 44-50"},"PeriodicalIF":3.9,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}