Pattern Recognition Letters最新文献_第4页

FCL-ViT: Task-aware attention tuning for Continual Learning FCL-ViT：面向持续学习的任务感知注意力调整

IF 3.3 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-08-14 DOI: 10.1016/j.patrec.2025.08.003

Anestis Kaimakamidis, Ioannis Pitas

引用次数: 0

Factorized neural radiance field for autonomous driving 自动驾驶的分解神经辐射场

IF 3.3 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-08-11 DOI: 10.1016/j.patrec.2025.07.023

Fang Zhang , Qian Chen , Xiuzhuang Zhou

{"title":"Factorized neural radiance field for autonomous driving","authors":"Fang Zhang , Qian Chen , Xiuzhuang Zhou","doi":"10.1016/j.patrec.2025.07.023","DOIUrl":"10.1016/j.patrec.2025.07.023","url":null,"abstract":"<div><div>Dynamic 3D reconstruction presents a critical challenge for autonomous driving applications. While significant progress has been made in static scene modeling, existing methods often struggle to accurately represent dynamic objects within complex traffic environments. To address this limitation, we introduce a novel factorized neural radiance field (NeRF) framework specifically tailored for autonomous driving scenarios. Our approach utilizes a Signed Distance Function (SDF)-based representation to capture fine geometric details while simultaneously learning 4D dynamic NeRFs, thereby obviating the need for difficult-to-annotate 3D bounding boxes. Specifically, we decompose the scene into static foreground, dynamic foreground, and background components, assigning a 4D spatio-temporal representation only to the dynamic foreground. To enhance sampling efficiency, we introduce a dual-branch proposal network that concurrently guides sampling for both dynamic and static elements. During rendering, SDF values are transformed into volume densities, which are then fused from the dynamic and static fields. Extensive experiments demonstrate that our method achieves high-fidelity street-scene rendering, accurately reconstructing both static structures and dynamic objects.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 202-207"},"PeriodicalIF":3.3,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144841632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RDANet: Retinex decomposition attention network for low-light image enhancement RDANet：用于弱光图像增强的视网膜分解注意网络

IF 3.3 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-08-09 DOI: 10.1016/j.patrec.2025.07.026

Xingyun Gao , Weibo Zhang , Peixian Zhuang , Wenyi Zhao , Weidong Zhang

引用次数: 0

Spatial Transformer Correlation Network for natural image classification 用于自然图像分类的空间变压器相关网络

IF 3.3 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-08-09 DOI: 10.1016/j.patrec.2025.07.027

Xinzhi Liu , Jun Yu , Toru Kurihara , Jiajia Xu , Shu Zhan

{"title":"Spatial Transformer Correlation Network for natural image classification","authors":"Xinzhi Liu , Jun Yu , Toru Kurihara , Jiajia Xu , Shu Zhan","doi":"10.1016/j.patrec.2025.07.027","DOIUrl":"10.1016/j.patrec.2025.07.027","url":null,"abstract":"<div><div>In the early years, Convolutional Neural Network (CNN) is dedicated to natural image classification by capturing high level semantic features. As an alternative, this paper proposes to use a correlation image to study the general association between discrete pixels directly, and search for the pixel sequence with the most evident changes. As the outcome of a three-phase correlation image sensor (3PCIS), the complex-valued correlation image handles a series of images and encodes intensity changes of incident light as Fourier coefficients when time goes by. To create a subsequence for a static image, we borrow a spatial transformer network to adaptively learn affine transformation parameters and obtain correlation features of this sequence. A complex convolutional neural network is employed to deal with the correlation image, and we accordingly propose the Spatial Transformer Correlation Network (STCN). Experimentally, our STCN model outperforms previous CNN structures by promoting classification accuracy by 0.21%, 2.84% and 2.35% on MNIST, CIFAR-10, and CIFAR-100 datasets respectively. This witnesses the successful application of correlation image to the natural image classification field for the first time, owing to the incorporation of motion properties rather than solo static appearance.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 182-187"},"PeriodicalIF":3.3,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144830894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep active learning: A reality check 深度主动学习：现实检验

IF 3.3 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-08-07 DOI: 10.1016/j.patrec.2025.07.017

Edrina Gashi , Jiankang Deng , Ismail Elezi

引用次数: 0

Benchmarking federated learning for semantic datasets: Federated scene graph generation 对语义数据集的联邦学习进行基准测试：联邦场景图生成

IF 3.3 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-08-06 DOI: 10.1016/j.patrec.2025.07.020

SeungBum Ha , Taehwan Lee , Jiyoun Lim , Sung Whan Yoon

{"title":"Benchmarking federated learning for semantic datasets: Federated scene graph generation","authors":"SeungBum Ha , Taehwan Lee , Jiyoun Lim , Sung Whan Yoon","doi":"10.1016/j.patrec.2025.07.020","DOIUrl":"10.1016/j.patrec.2025.07.020","url":null,"abstract":"<div><div>Federated learning (FL) enables decentralized training while preserving data privacy, yet existing FL benchmarks address relatively simple classification tasks, where each sample is annotated with a one-hot label. However, little attention has been paid to demonstrating an FL benchmark that handles complicated semantics, where each sample encompasses diverse semantic information, such as relations between objects. Because the existing benchmarks are designed to distribute data in a narrow view of a single semantic, managing the complicated <em>semantic heterogeneity</em> across clients when formalizing FL benchmarks is non-trivial. In this paper, we propose a benchmark process to establish an FL benchmark with controllable semantic heterogeneity across clients: two key steps are (i) data clustering with semantics and (ii) data distributing via controllable semantic heterogeneity across clients. As a proof of concept, we construct a federated PSG benchmark, demonstrating the efficacy of the existing PSG methods in an FL setting with controllable semantic heterogeneity of scene graphs. We also present the effectiveness of our benchmark by applying robust federated learning algorithms to data heterogeneity to show increased performance. To our knowledge, this is the first benchmark framework that enables federated learning and its evaluation for multi-semantic vision tasks under the controlled semantic heterogeneity. Our code is available at <span><span>https://github.com/Seung-B/FL-PSG</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 195-201"},"PeriodicalIF":3.3,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144841631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CF-FFP: A Coarse-to-Fine Fast Filter Pruning framework CF-FFP：一种从粗到精的快速滤波剪枝框架

IF 3.3 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-08-05 DOI: 10.1016/j.patrec.2025.07.024

Ming Ma, Wenhui Li, Tongzhou Zhang, Ziming Wang, Ying Wang

{"title":"CF-FFP: A Coarse-to-Fine Fast Filter Pruning framework","authors":"Ming Ma, Wenhui Li, Tongzhou Zhang, Ziming Wang, Ying Wang","doi":"10.1016/j.patrec.2025.07.024","DOIUrl":"10.1016/j.patrec.2025.07.024","url":null,"abstract":"<div><div>Conventional pruning paradigm typically determines the pruned network structure while identifying and removing filters, requiring iterative pruning and fine-tuning, which incurs substantial time and computational costs. Moreover, existing methods overly emphasize the importance of individual filters while neglecting the optimization of the overall network structure, resulting in performance degradation. In this letter, a new Coarse-to-Fine Fast Filter Pruning (CF-FFP) framework is proposed, which decomposes the conventional pruning paradigm into two offline learning stages to achieve fast and efficient model compression. Specifically, the pruned network structure is coarsely determined based on the importance of weights, and an adaptive balancing strategy is proposed to address the issue of significant pruning rate differences across layers. Then, a dual redundancy screening criterion is proposed to finely identify and prune redundant filters based on their similarity and contribution, thereby initializing the pruned network structure. Thanks to CF-FFP’s two-stage offline pruning process, which progresses from coarse to fine, the pruning inference time is significantly reduced. Extensive experiments show that our method outperforms the state-of-the-art methods on CIFAR-10 and ImageNet datasets. For instance, CF-FFP prunes 51.2% FLOPs of ResNet50 on the ImageNet dataset with only 0.67% drop in Top-1 accuracy.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 139-145"},"PeriodicalIF":3.3,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144780657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Single-stage table structure recognition approach via efficient sequence modeling 基于高效序列建模的单阶段表结构识别方法

IF 3.3 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-08-05 DOI: 10.1016/j.patrec.2025.07.022

Eunji Lee , Yeonsik Jo , Soonyoung Lee , Seung Hwan Kim , Nam Ik Cho

{"title":"Single-stage table structure recognition approach via efficient sequence modeling","authors":"Eunji Lee , Yeonsik Jo , Soonyoung Lee , Seung Hwan Kim , Nam Ik Cho","doi":"10.1016/j.patrec.2025.07.022","DOIUrl":"10.1016/j.patrec.2025.07.022","url":null,"abstract":"<div><div>Table structure recognition (TSR) comprises two subtasks: recognizing the logical structure of a table and detecting the physical coordinates of table cells. Existing end-to-end methods usually generate sequences in HTML format, leading to excessive token usage and misalignment with the visual representation of table cells. Additionally, these existing TSR methods adopt two-stage approaches, detecting cells based on predicted logical structures. However, the sequential methods often prioritize the preceding stage. This paper presents an efficient solution to tackle the above-stated challenges by introducing a novel TSR approach in a single-stage framework. Specifically, we propose a sequence-based table representation called Cell-Code to enhance output efficiency and increase the visual-structure correlation. We also design a Single-Stage TSR network named S<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>Tab that implicitly performs cell detection and logical structure recognition. The encoder includes a cell center heatmap regression module for implicit physical cell detection, while the decoder uses inter-task attention for effective information sharing between logical and physical features. We have demonstrated the superiority of our method through experiments on benchmark datasets. Codes will be made publicly available at <span><span>https://github.com/ejlee95/S2Tab</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 146-153"},"PeriodicalIF":3.3,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144780658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DDF2Pol: A Dual-Domain Feature Fusion Network for PolSAR Image Classification 基于DDF2Pol的偏振sar图像分类双域特征融合网络

IF 3.3 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-07-29 DOI: 10.1016/j.patrec.2025.07.015

Mohammed Q. Alkhatib

{"title":"DDF2Pol: A Dual-Domain Feature Fusion Network for PolSAR Image Classification","authors":"Mohammed Q. Alkhatib","doi":"10.1016/j.patrec.2025.07.015","DOIUrl":"10.1016/j.patrec.2025.07.015","url":null,"abstract":"<div><div>This paper presents DDF2Pol, a lightweight dual-domain convolutional neural network for PolSAR image classification. The proposed architecture integrates two parallel feature extraction streams — one real-valued and one complex-valued — designed to capture complementary spatial and polarimetric information from PolSAR data. To further refine the extracted features, a depth-wise convolution layer is employed for spatial enhancement, followed by a coordinate attention mechanism to focus on the most informative regions. Experimental evaluations conducted on two benchmark datasets, Flevoland and San Francisco, demonstrate that DDF2Pol achieves superior classification performance while maintaining low model complexity. Specifically, it attains an Overall Accuracy (OA) of 98.16% on the Flevoland dataset and 96.12% on the San Francisco dataset, outperforming several state-of-the-art real- and complex-valued models. With only 91,371 parameters, DDF2Pol offers a practical and efficient solution for accurate PolSAR image analysis, even when training data is limited. The source code is publicly available at <span><span>https://github.com/mqalkhatib/DDF2Pol</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 110-116"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144723664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Stage-wise Fusion Transformer for light field saliency detection 学习阶段明智的融合变压器光场显著性检测

IF 3.3 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-07-29 DOI: 10.1016/j.patrec.2025.07.005

Wenhui Jiang , Qi Shu , Hongwei Cheng , Yuming Fang , Yifan Zuo , Xiaowei Zhao

{"title":"Learning Stage-wise Fusion Transformer for light field saliency detection","authors":"Wenhui Jiang , Qi Shu , Hongwei Cheng , Yuming Fang , Yifan Zuo , Xiaowei Zhao","doi":"10.1016/j.patrec.2025.07.005","DOIUrl":"10.1016/j.patrec.2025.07.005","url":null,"abstract":"<div><div>Light field salient object detection (SOD) has attracted tremendous research efforts recently. As the light field data contains multiple images with different characteristics, effectively integrating the valuable information from these images remains under-explored. Recent efforts focus on aggregating the complementary information from all-in-focus (AiF) and focal stack images (FS) late in the decoding stage. In this paper, we explore how learning the AiF and FS image encoders jointly can strengthen light field SOD. Towards this goal, we propose a Stage-wise Fusion Transformer (SF-Transformer) to aggregate the rich information from AiF image and FS images at different levels. Specifically, we present a Focal Stack Transformer (FST) for focal stacks encoding, which makes full use of the spatial-stack correlations for performant FS representation. We further introduce a Stage-wise Deep Fusion (SDF) which refines both AiF and FS image representation by capturing the multi-modal feature interactions in each encoding stage, thus effectively exploring the advantages of AiF and FS characteristics. We conduct comprehensive experiments on DUT-LFSD, HFUT-LFSD, and LFSD. The experimental results validate the effectiveness of the proposed method.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 117-123"},"PeriodicalIF":3.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144738449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0