Jiquan Shan , Junxiao Wang , Lifeng Zhao , Liang Cai , Hongyuan Zhang , Ioannis Liritzis
{"title":"AnchorFormer: Differentiable anchor attention for efficient vision transformer","authors":"Jiquan Shan , Junxiao Wang , Lifeng Zhao , Liang Cai , Hongyuan Zhang , Ioannis Liritzis","doi":"10.1016/j.patrec.2025.07.016","DOIUrl":"10.1016/j.patrec.2025.07.016","url":null,"abstract":"<div><div>Recently, vision transformers (ViTs) have achieved excellent performance on vision tasks by measuring the global self-attention among the image patches. Given <span><math><mi>n</mi></math></span> patches, they will have quadratic complexity such as <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> and the time cost is high when splitting the input image with a small granularity. Meanwhile, the pivotal information is often randomly gathered in a few regions of an input image, some tokens may not be helpful for the downstream tasks. To handle this problem, we introduce an anchor-based efficient vision transformer (<strong>AnchorFormer</strong>), which employs the anchor tokens to learn the pivotal information and accelerate the inference. Firstly, by estimating the bipartite attention between the anchors and tokens, the complexity will be reduced from <span><math><mrow><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></mrow></math></span> to <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>m</mi><mi>n</mi><mo>)</mo></mrow></mrow></math></span>, where <span><math><mi>m</mi></math></span> is an anchor number and <span><math><mrow><mi>m</mi><mo><</mo><mi>n</mi></mrow></math></span>. Notably, by representing the anchors with the neurons in a neural layer, we can differentiably learn these anchors and approximate global self-attention through the Markov process. It avoids the burden caused by non-differentiable operations and further speeds up the approximate attention. Moreover, we extend the proposed model to three downstream tasks including classification, detection, and segmentation. Extensive experiments show the effectiveness of AnchorFormer, e.g., achieving up to a <em><strong>9.0%</strong></em> higher accuracy or <em><strong>46.7%</strong></em> FLOPs reduction on ImageNet classification, <em><strong>81.3%</strong></em> higher mAP on COCO detection under comparable FLOPs, as compared to the current baselines.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 124-131"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144738450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved fine-tuning of mask-aware transformer for personalized face inpainting with semantic-aware regularization","authors":"Yuan Zeng , Yijing Sun , Yi Gong","doi":"10.1016/j.patrec.2025.07.009","DOIUrl":"10.1016/j.patrec.2025.07.009","url":null,"abstract":"<div><div>Recent advances in generative models have led to significant improvements in the challenging task of high-fidelity image inpainting. How to effectively guide or control these powerful models to perform personalized tasks becomes an important open problem. In this letter, we introduce a semantic-aware fine-tuning method for adapting a pre-trained image inpainting model, mask-aware transformer (MAT), to personalized face inpainting. Unlike existing methods, which tune a personalized generative prior with multiple reference images, our method can recover the key facial features of the individual with only few input references. To improve the fine-tuning stability in a setting with few reference images, we propose a multiscale semantic-aware regularization to encourage the generated key facial components to match those in the reference. Specifically, we generate a mask to extract the key facial components as prior knowledge and impose a semantic-based regularization on these regions at multiple scales, with which the fidelity and identity preservation of facial components are significantly promoted. Extensive experiments demonstrate that our method can generate high-fidelity personalized face inpainting results using only three reference images, which is much fewer than personalized inpainting baselines.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 95-101"},"PeriodicalIF":3.3,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144721453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved Contrastive Predictive Coding for Time Series Out-Of-Distribution Detection Applied to Human Activity Data","authors":"Amirhossein Ahmadian, Fredrik Lindsten","doi":"10.1016/j.patrec.2025.07.011","DOIUrl":"10.1016/j.patrec.2025.07.011","url":null,"abstract":"<div><div>Contrastive Predictive Coding (CPC) is a well-established self-supervised learning method that naturally fits time series data. This method has been recently leveraged to detect anomalous inputs, viewed as the task of classifying positive pairs of context-feature representations versus negative ones in order to employ classifier uncertainty measures. In this paper, by taking a different perspective, we propose a CPC-based Out-Of-Distribution (OOD) detection method for time series data that does not require any negative samples at test time and is theoretically related to a probabilistic type of uncertainty estimation in the latent representation space. Our method extends the standard CPC by using a radial (distance-based) score function both in the training loss and as the OOD measure, in addition to quantizing the context (replacing it by cluster prototypes) during inference. The proposed method is applied to detecting OOD human activities with smartphone sensors data and shows promising performance on two primary datasets without using activity labels in training.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 132-138"},"PeriodicalIF":3.3,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144766909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimized Deep Isolation Forest","authors":"Łukasz Gałka","doi":"10.1016/j.patrec.2025.07.014","DOIUrl":"10.1016/j.patrec.2025.07.014","url":null,"abstract":"<div><div>Anomaly detection and the identification of elements that do not fit the data characteristics are increasingly used in information systems, both for data cleaning and for finding unusual elements. Unsupervised anomaly detection methods are particularly useful in this context. This paper introduces the Optimized Deep Isolation Forest (ODIF) as an optimized version of the Deep Isolation Forest (DIF) algorithm. The training of DIF is subjected to an optimization of the operations performed, which leads to a reduction of the computational and memory complexity. In a series of experiments, both DIF and ODIF are implemented, and their effectiveness is evaluated using Area Under the Precision-Recall Curve (PR AUC). The proposed method demonstrates significantly better detection performance compared to the baseline Isolation Forest and competitive techniques. Additionally, the execution times of the training phase are measured for both the CPU and GPU stages, as well as memory usage, including RAM and VRAM. The results unequivocally indicate a much faster execution of the ODIF algorithm compared to DIF, with average CPU stage and GPU stage times being over one and a half times and nearly 150 times shorter, respectively. Similarly, memory usage is significantly reduced for ODIF in comparison to DIF, with RAM consumption lowered by approximately 18% and VRAM by over 55%.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 88-94"},"PeriodicalIF":3.9,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144712915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiujing Xu , Peiming Guo , Fei Li , Meishan Zhang , Donghong Ji
{"title":"Improving LLM-based opinion expression identification with dependency syntax","authors":"Qiujing Xu , Peiming Guo , Fei Li , Meishan Zhang , Donghong Ji","doi":"10.1016/j.patrec.2025.07.012","DOIUrl":"10.1016/j.patrec.2025.07.012","url":null,"abstract":"<div><div>Opinion expression identification (OEI), a crucial task in fine-grained opinion mining, has received long-term attention for several decades. Recently, large language models (LLMs) have demonstrated substantial potential on the task. However, structural-aware syntax features, which have proven highly effective for encoder-based OEI models, remain challenging to be explored under the LLM paradigm. In this work, we introduce a novel approach that successfully enhances LLM-based OEI with the aid of dependency syntax. We start with a well-formed prompt learning framework for OEI, and then enrich the prompting text with syntax information from an off-the-shelf dependency parser. To mitigate the negative impact of irrelevant dependency structures, we employ a BERT-based CRF model as a retriever to select only salient dependencies. Experiments on three benchmark datasets covering English, Chinese and Portuguese indicate that our method is highly effective, resulting in significant improvements on all datasets. We also provide detailed analysis to understand our method in-depth.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 81-87"},"PeriodicalIF":3.9,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jongseong Bae , Susang Kim , Minsu Cho , Ha Young Kim
{"title":"MVFormer: Diversifying feature normalization and token mixing for efficient vision transformers","authors":"Jongseong Bae , Susang Kim , Minsu Cho , Ha Young Kim","doi":"10.1016/j.patrec.2025.07.019","DOIUrl":"10.1016/j.patrec.2025.07.019","url":null,"abstract":"<div><div>Active research is currently underway to enhance the efficiency of vision transformers (ViTs). Most studies have focused solely on token mixers, overlooking the potential relationship with normalization. To boost diverse feature learning, we propose two components: multi-view normalization (MVN) and multi-view token mixer (MVTM). The MVN integrates three differently normalized features via batch, layer, and instance normalization using a learnable weighted sum, expected to offer diverse feature distribution to the token mixer, resulting in beneficial synergy. The MVTM is a convolution-based multiscale token mixer with local, intermediate, and global filters which incorporates stage specificity by configuring various receptive fields at each stage, efficiently capturing ranges of visual patterns. By adopting both in the MetaFormer block, we propose a novel ViT, multi-vision transformer (MVFormer). Our MVFormer outperforms state-of-the-art convolution-based ViTs on image classification with the same or lower parameters and MACs. Particularly, MVFormer variants, MVFormer-T, S, and B achieve 83.4 %, 84.3 %, and 84.6 % top-1 accuracy, respectively, on ImageNet-1 K benchmark.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 72-80"},"PeriodicalIF":3.9,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing change detection in multi-date images using a Multi-temporal Siamese Neural Network","authors":"Farah Chouikhi , Ali Ben Abbes , Imed Riadh Farah","doi":"10.1016/j.patrec.2025.07.010","DOIUrl":"10.1016/j.patrec.2025.07.010","url":null,"abstract":"<div><div>Satellite imagery’s temporal and spatial variability presents challenges for accurate change detection. To address this challenge, we propose a Multi-temporal Siamese Variational Auto-Encoder (MSVAE). MSVAE obtains latent representations by concatenating extracted features from multi-date images while sharing weights. This architecture combines the advantages of Siamese networks and variational auto-encoders (VAE), ensuring both spatial and temporal consistency of the extracted features for desertification detection. We conducted experiments in the arid regions of Tunisia using Landsat imagery and supervised classification techniques for desertification detection. The results demonstrated a classification accuracy of 98.46% for the proposed MSVAE, outperforming other models, such as the Multi-temporal Siamese Convolutional Neural Network (MSCNN) and the Multi-temporal Siamese Recurrent Neural Network (MSRNN).</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 65-71"},"PeriodicalIF":3.9,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"O2Flow: Object-aware optical flow estimation","authors":"Ziyi Liu, Jiawei Wang, Haixu Bi, Hongmin Liu","doi":"10.1016/j.patrec.2025.06.023","DOIUrl":"10.1016/j.patrec.2025.06.023","url":null,"abstract":"<div><div>Optical flow estimation is a fundamental task in computer vision that aims to obtain dense pixel motion of adjacent frames. It is widely considered a low-level vision task. Existing methods mainly focus on capturing local or global matching clues for pixel motion modeling, while neglecting object-level information. However, human perception of motion is closely linked to high-level object understanding. To introduce the object awareness into the optical flow estimation pipeline, we propose an Object-aware Optical Flow Estimation framework (O<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Flow) comprising two branches: the Object Awareness (OA) Branch and the Flow Prediction (FP) Branch. Specifically, the FP-branch serves the basic optical flow prediction function, while the OA-branch is designed to capture the object-level information, guided by an auxiliary moving object prediction task. Extensive experimental results demonstrate that our method significantly enhances optical flow estimation performance in these challenging regions. Compared with the other two-view methods, O<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Flow achieves state-of-the-art results on the Sintel and KITTI-2015 benchmarks.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 58-64"},"PeriodicalIF":3.9,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144702923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aman Verma , Gaurav Jaswal , Seshan Srirangarajan , Sumantra Dutta Roy
{"title":"Biometric characteristics of hand gestures through joint decomposition of cross-subject and cross-session biases","authors":"Aman Verma , Gaurav Jaswal , Seshan Srirangarajan , Sumantra Dutta Roy","doi":"10.1016/j.patrec.2025.07.007","DOIUrl":"10.1016/j.patrec.2025.07.007","url":null,"abstract":"<div><div>Hand gestures are natural in human–computer interfaces for their use in gesture-based control, as well as personalized devices with privacy preservation and device personalization. However, some hand gestures bear biometric traits, whereas some others generalize well across subjects. The ‘entanglement’ between these concepts is in gestures with tight intra-subject clusters (in a feature space), and low inter-subject separation. This puts forth a key requirement: <em>quantifying the biometric ‘goodness’ of a gesture so as to segregate between the gestures prudent for authentication and recognition-level generalization</em>. We propose a biometric characterization framework that estimates how uniquely and with how much variability, any unseen/seen subject would perform a particular gesture. We leverage a theoretical understanding of personality and stability biases present in cross-user (<em>CU</em>) and cross-session (<em>CS</em>) gesture recognition experiments. Structuring upon the biases that influence performance in these experiments, we derive mathematical relationships that quantify the biometric goodness of a gesture. In order to disentangle bias for identity understanding, we introduce two strategies based upon ‘bias mitigation’ and ‘bias intensification’. We empirically validate the proposed framework through experiments on three datasets. The proposed framework is generic and does not require user identity information. Additionally, this can operate over any existing hand-gesture recognition pipelines.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 44-50"},"PeriodicalIF":3.9,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nahrin Jannat , S.M. Mahedy Hasan , Minhaz F. Zibran
{"title":"A novel ensemble approach for crop disease detection by leveraging customized EfficientNets and interpretability","authors":"Nahrin Jannat , S.M. Mahedy Hasan , Minhaz F. Zibran","doi":"10.1016/j.patrec.2025.07.008","DOIUrl":"10.1016/j.patrec.2025.07.008","url":null,"abstract":"<div><div>Crop leaf diseases present a persistent and serious threat to agricultural productivity and food security, especially in agro-based countries. An effective resolution of this issue demands the development of automated methods for the timely detection and management of crop diseases. In this work, we present a novel ensemble technique for the automatic detection of crop diseases, utilizing four diverse datasets: corn, potato, wheat, and tomato, each containing images of both healthy and disease-affected crop leaves. While previous studies often employed basic transfer learning (TL) techniques, we aimed to improve TL performance by systematically integrating different versions of EfficientNet and customizing their architectures with additional layers. A key contribution of our research is a novel model selection method for ensemble learning, which goes beyond traditional accuracy metrics by addressing misclassifications and class-specific shortcomings. We developed a tailored approach using misclassification counts and Hamming Loss to redefine the model selection process, identifying the most suitable EfficientNet models for each dataset. We applied Gradient Class Activation Mapping (Grad-CAM) to visualize the model’s prediction process and integrated Shapley Additive Explanations (SHAP) to enhance interpretability by providing detailed insights into feature contributions. Thus, we introduced an efficient and transparent technique for automatic crop disease detection, achieving over 99% accuracy, precision, recall, and F-Score across all datasets, significantly outperforming existing methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 370-377"},"PeriodicalIF":3.3,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145264780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}