Changfeng Li , Xiaonan Mao , Zhiwei Ning , Jie Yang , Wei Liu
{"title":"LRTG3D: Large receptive field 3D object detection with truncated Gaussian denoising query","authors":"Changfeng Li , Xiaonan Mao , Zhiwei Ning , Jie Yang , Wei Liu","doi":"10.1016/j.patrec.2025.08.024","DOIUrl":"10.1016/j.patrec.2025.08.024","url":null,"abstract":"<div><div>In recent years, 3D object detection has emerged as a critical component in autonomous driving systems, drawing significant research interest. Classic voxel-based sparse convolutional neural network (CNN) has been widely used in single-modality detection due to their efficiency and accuracy in feature extraction. However, as detection heads become increasingly complex, the feature extraction capabilities of the backbone networks often fall short, necessitating improvements in feature richness. Therefore, it is crucial to enhance the features extracted from the backbone to better adapt to the needs of object detection tasks. In this paper, we propose a series of synergistic enhancements to the plain sparse CNN backbone. We introduce z-preserved downsampling (Z-PD) to expand the receptive field while preserving critical height information. At the core of our backbone is the novel dual-focus receptive field (DFRF) block, which integrates our proposed dual-scale spatial convolution (DSSC) to balance large receptive field with precision, and hybrid-focus sparse convolution (HFSC) to robustly capture foreground features. Additionally, to accelerate convergence, we introduce a truncated Gaussian denoising query (T-GDQ) in the decoder to better align with the enhanced features. Extensive experiments on the nuScenes and Waymo datasets validate the effectiveness of the proposed method. Notably, our model achieves a 67.3 mAP and 71.9 NDS on nuScenes dataset, showing the superior performance over the leading 3D detection approaches.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 346-352"},"PeriodicalIF":3.3,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dr. Akram Bennour , Dr. Imad Ridha (PHD in computer sciences) , Dr. Mohammed Al-Sarem (PHD in computer sciences)
{"title":"Introduction to the special section “Advance and future tendencies of intelligent systems & Pattern Recognition Applications” (SS:ISPR24)","authors":"Dr. Akram Bennour , Dr. Imad Ridha (PHD in computer sciences) , Dr. Mohammed Al-Sarem (PHD in computer sciences)","doi":"10.1016/j.patrec.2025.09.001","DOIUrl":"10.1016/j.patrec.2025.09.001","url":null,"abstract":"","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 368-369"},"PeriodicalIF":3.3,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145264779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Siamese network-based large-size remote sensing change detection network based on differential enhancement","authors":"Shenbo Liu, Dongxue Zhao, Lijun Tang","doi":"10.1016/j.patrec.2025.08.020","DOIUrl":"10.1016/j.patrec.2025.08.020","url":null,"abstract":"<div><div>Existing change detection algorithms often face challenges in large-size remote sensing images, such as boundary discontinuity, insufficient correlation between semantic and change information, and inadequate extraction of differential information from dual-temporal images. To address these issues, this paper proposes a large-size remote sensing change detection network based on the design concept of differential enhancement, named DECD. By integrating attention mechanisms and staged difference extraction techniques, we have designed a large-scale dual-temporal difference enhancement module to accurately capture and enhance change features. Additionally, by leveraging the synergistic effect of change loss and segmentation loss, we have developed a segmentation-enhanced loss function, significantly improving the model’s segmentation performance. Compared with nine advanced algorithms on the WHU-CD, LEVIR-CD and MSRS-CD datasets, the F1 score of DECD was the best, reaching 90.98%, 91.75% and 76.66% respectively. In addition, the DECD inference speed was 11.78 ms, which is faster than FCCDN (15.29 ms) and Changeformer (28.78 ms).</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 319-324"},"PeriodicalIF":3.3,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144925526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhanqiang Huo , Chunxin Yuan , Kunwei Zhang , Yingxu Qiao , Fen Luo
{"title":"VMamba-Crowd: Bridging multi-scale features from Visual Mamba for weakly-supervised crowd counting","authors":"Zhanqiang Huo , Chunxin Yuan , Kunwei Zhang , Yingxu Qiao , Fen Luo","doi":"10.1016/j.patrec.2025.08.005","DOIUrl":"10.1016/j.patrec.2025.08.005","url":null,"abstract":"<div><div>Weakly-supervised crowd counting requires only count-level annotations instead of location-level annotations that makes it become a new research hotspot in the field of crowd counting. Currently, most weakly-supervised crowd counting networks based on deep learning utilize CNNs and/or Transformers to extract features and build global contexts, overlooking multi-scale feature fusion and therefore leading to suboptimal feature representation and utilization. The more advanced Mamba model, leveraging its selective state space mechanism, excels in feature extraction for image processing tasks particularly in capturing multi-scale features without relying on self-attention. In this paper, we introduce the carefully selected multi-scale features extracted from Visual Mamba into weakly-supervised crowd counting task for the first time and propose the VMamba-Crowd model. Specifically, the Adjacent-scale Progressive Bridging Module (APBM) progressively facilitates the interactions between adjacent high-level semantic and low-level detailed information across both channel and spatial dimensions. The Mixed Regression Bridging Module (MRBM) performs secondary mixed regression to bridge multi-scale global feature information. Extensive experiments demonstrate that our VMamba-Crowd surpasses the performance of most existing weakly-supervised crowd counting networks and achieves competitive performance compared to fully-supervised ones. In particular, cross-dataset experiments confirm that our weakly-supervised method has a remarkable generalization ability.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 297-303"},"PeriodicalIF":3.3,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144917132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A lightweight multi-scaled semantic segmentation for underground mine images","authors":"Yuanbin Wang, Wenqing He, Qianxi Li, Xiaolong Wang, Wenjian Chang","doi":"10.1016/j.patrec.2025.08.019","DOIUrl":"10.1016/j.patrec.2025.08.019","url":null,"abstract":"<div><div>Analyzing complex scene images in underground mines is crucial for ensuring safe coal mining.The semantic segmentation of underground objects can contribute to grasp the complex information of underground mine. However, most existing approaches lose the details of small objects, and do not utilize features across different scales effectively. Simultaneously, the extended runtime of the model hinders the timely delivery of segmentation information for the mining task. Thus, a lightweight semantic segmentation method based on the DeeplabV3+ model is proposed for image segmentation. Firstly, to reduce model complexity while improving model segmentation performance, MobileNetV2 is utilized as the backbone network. Secondly, Atrous Spatial Pyramid Mixed Pooling (ASPMP) with mixed pooling module is presented, which leverages multiscaled features extracted from different scale objects under the mine. Meanwhile, the void rate of ASPMP is optimized for better extraction of smaller underground objects. Finally, in the stage of the decoder, Feature Fusion Module(FFM) containing the channel attention mechanism is constructed for the fusion of high and low features, and the residual network structure further reduces the computational load. Experimental results show that the proposed method substantially reduces the quantity of parameters and the amount of calculation while the segmentation precision is guaranteed. The proposed method achieves a balance between accuracy and efficiency on CUMT-CMUID dataset and Cityscapes dataset.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 325-331"},"PeriodicalIF":3.3,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144925527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Renda Han, Mengzhe Sun, Zeyi Li, Mengfei Li, Tianyu Hu, Zhenhua Yang, Jingxin Liu
{"title":"Dual Feature Enhancement Graph Clustering Network","authors":"Renda Han, Mengzhe Sun, Zeyi Li, Mengfei Li, Tianyu Hu, Zhenhua Yang, Jingxin Liu","doi":"10.1016/j.patrec.2025.08.016","DOIUrl":"10.1016/j.patrec.2025.08.016","url":null,"abstract":"<div><div>Deep graph clustering is a fundamental method in unsupervised learning. Recently, deep clustering fusion methods relying on representation learning typically employ Auto-Encoders (AEs) and Graph Neural Networks (GNNs) to capture high-dimensional information representations of node attributes and graph structure. However, non-important graph structure information and redundant fused representations lead to a less discriminative graph representation, limiting clustering performance. To tackle this issue, we propose a Dual Feature Enhancement Graph Clustering Network (DFE-GCN). Specifically, we develop a critical node selection mechanism that calculates the importance score of each node to adjust edge weights, reducing non-important connections while enhancing important connections. Moreover, we design a heterogeneous information fusion strategy that fine-tunes the node attributes and graph structure fused layer by layer between nodes, dynamically filtering out redundant representations and forming a robust target distribution. Extensive experiments on five datasets have proven that the proposed method consistently outperforms advanced clustering methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 339-345"},"PeriodicalIF":3.3,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145018877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual interaction network with cross-image attention for medical image segmentation","authors":"Jeonghyun Noh , Wangsu Jeon , Jinsun Park","doi":"10.1016/j.patrec.2025.08.018","DOIUrl":"10.1016/j.patrec.2025.08.018","url":null,"abstract":"<div><div>Medical image segmentation is a crucial method for assisting professionals in diagnosing various diseases through medical imaging. However, various factors such as noise, blurriness, and low contrast often hinder the accurate diagnosis of diseases. While numerous image enhancement techniques can mitigate these issues, they may also alter crucial information needed for accurate diagnosis in the original image. Conventional image fusion strategies such as feature concatenation can address this challenge. However, they struggle to fully leverage the advantages of both original and enhanced images while suppressing the side effects of the enhancements. To overcome the problem, we propose a dual interactive fusion module (DIFM) that effectively exploits mutual complementary information from the original and enhanced images. DIFM employs cross-attention bidirectionally to simultaneously attend to corresponding spatial information across different images, subsequently refining the complementary features via global spatial attention. This interaction leverages low- to high-level features implicitly associated with diverse structural attributes like edges, blobs, and object shapes, resulting in enhanced features that embody important spatial characteristics. In addition, we introduce a multi-scale boundary loss based on gradient extraction to improve segmentation accuracy at object boundaries. Experimental results on the ACDC and Synapse datasets demonstrate the superiority of the proposed method quantitatively and qualitatively.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":""},"PeriodicalIF":3.3,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144996364","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengjie Tang , Jie Tu , Panyu Zhou , Kelvin K.L. Wong
{"title":"An asymmetric teacher-student network based industrial vision model for abnormal grain detection of semiconductor cooling devices","authors":"Mengjie Tang , Jie Tu , Panyu Zhou , Kelvin K.L. Wong","doi":"10.1016/j.patrec.2025.08.021","DOIUrl":"10.1016/j.patrec.2025.08.021","url":null,"abstract":"<div><div>Unsupervised detection of micro-defects in thermoelectric cooler (TEC) grains faces significant challenges due to subtle, low-contrast anomalies and the high cost of manual annotation. In this work, we propose ATS-Net, an asymmetric teacher-student network built upon a shared ResNet backbone, designed as a lightweight, deployment-ready model that generalizes effectively across datasets without per-dataset tuning. Our key contributions include exponential moving-average normalization to stabilize feature statistics, a single-layer Real-NVP coupling mechanism to amplify teacher-student discrepancies at anomaly regions, and a dual-scale contextual transformer block facilitating joint local and global attention. ATS-Net is trained exclusively on defect-free samples in a two-stage process, and evaluated using image-level AUROC, pixel-level PRO at 95 % recall, and mean intersection-over-union (mIoU) metrics on the proprietary TEC-Grain dataset and the public MVTec-AD benchmark. Experimental results demonstrate that ATS-Net achieves superior performance, reaching 99.2 % AUROC, 99.1 % PRO, and 0.989 mIoU on TEC-Grain, and 98.8 % AUROC, 98.6 % PRO, and 0.97 mIoU on MVTec-AD. The model operates efficiently at 3.8 GFLOPs with 19.36 MB parameters and a speed of 96 FPS on an RTX 3090 GPU. Ablation studies show the introduced modules collectively enhance AUROC by 9.5 percentage points over the MKD baseline, with only a minor increase in parameters. ATS-Net thus effectively balances detection accuracy, interpretability, and speed, making it suitable for real-time defect inspection in semiconductor cooling device production. Future research will focus on integrating multi-modal fusion and self-supervised pretraining strategies to further minimize annotation requirements.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 288-296"},"PeriodicalIF":3.3,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144913419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Aggregated masked autoencoding for offline reinforcement learning","authors":"Changqing Yuan , Yongfang Xie , Shiwen Xie , Zhaohui Tang , Zongze Wu","doi":"10.1016/j.patrec.2025.08.007","DOIUrl":"10.1016/j.patrec.2025.08.007","url":null,"abstract":"<div><div>Viewing offline reinforcement learning (RL) as a sequence modeling problem has emerged as a new research trend. Recent approaches leverage self-supervised learning to improve sequence representations, yet most rely on state sequences for pretraining, thereby disrupting the intrinsic state–action coupling, which complicates the distinction of trajectory bifurcations caused by action quality differences. Moreover, actions from stochastic policies in offline datasets may cause low-quality state transitions to be mistakenly identified as salient information, hindering representation learning and degrading policy performance. To mitigate these issues, we propose aggregated masked future prediction (AMFP), a self-supervised learning framework for offline RL. AMFP introduces a new pretext task that combines weighted aggregation and masked autoencoding through global fusion tokens to perform aggregated masked reconstruction. The weighted aggregation mechanism is to assign higher weights to samples that are semantically similar to the anchor in the representation space, enabling the model to emphasize reliable state transitions and suppress misleading transitions from stochastic or low-quality actions. Meanwhile, the global fusion tokens serve a dual purpose: they facilitate the integration of weighted aggregation and masked autoencoding, and, after encoding, function as compressed representations of the state trajectory and implicit action-state coupling. The encoded representations are then utilized as the latent contextual factor to guide policy learning and improve robustness. Experimental evaluation on D4RL benchmarks demonstrates the effectiveness of our method in improving policy learning.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 312-318"},"PeriodicalIF":3.3,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144923060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Binh An Nguyen , Minh Bao Kha , Duc Manh Dao , Huu Kien Nguyen , My Duyen Nguyen , The Vu Nguyen , Namal Rathnayake , Yukinobu Hoshino , Tuan Linh Dang
{"title":"UFR-GAN: A lightweight multi-degradation image restoration model","authors":"Binh An Nguyen , Minh Bao Kha , Duc Manh Dao , Huu Kien Nguyen , My Duyen Nguyen , The Vu Nguyen , Namal Rathnayake , Yukinobu Hoshino , Tuan Linh Dang","doi":"10.1016/j.patrec.2025.08.008","DOIUrl":"10.1016/j.patrec.2025.08.008","url":null,"abstract":"<div><div>Real-world pattern recognition systems often face challenges in simultaneously handling images degraded by multiple factors such as rain, haze, and low-light conditions. Existing single-degradation models fail to generalize across diverse scenarios, while specialized models are computationally expensive and inefficient. This paper introduces UFR-GAN, a lightweight multi-degradation restoration GAN-based framework. Our model integrates transformer-driven feature aggregation to enhance long-range dependencies while maintaining computational efficiency. Additionally, we employ frequency-domain contrastive learning to disentangle overlapping artifacts, thereby improving the restoration of fine details. This approach may enable unified restoration across diverse degradation types. Our UFR-GAN, which only had 45.4 M parameters, achieved state-of-the-art (SOTA) performance with 26.87 PSNR and 0.83 SSIM in various degradation scenarios. It also had significantly lower computational complexity than SOTA approaches. Crucially, we demonstrated its broader impact on downstream pattern recognition tasks: integrating UFR-GAN with YOLOv11 improved vehicle detection accuracy by 23% at 0.73 mAP50 while reducing the inference time to only <span><math><mrow><mn>2</mn><mo>.</mo><mn>7</mn><mspace></mspace><mi>m</mi><mi>s</mi></mrow></math></span> under adverse weather conditions. These results indicated that our UFR-GAN could be used in many important areas of pattern recognition, including computer vision, remote sensing, traffic surveillance, and self-driving systems. Efficient and generalizable restoration techniques are essential in these domains.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 282-287"},"PeriodicalIF":3.3,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144913484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}