Pattern Recognition Letters最新文献_第10页

DiffSLT: Enhancing diversity in sign language translation via diffusion model 通过扩散模型增强手语翻译的多样性

IF 3.9 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-06-09 DOI: 10.1016/j.patrec.2025.06.008

JiHwan Moon , Jihoon Park , Jungeun Kim , Jongseong Bae , Hyeongwoo Jeon , Ha Young Kim

{"title":"DiffSLT: Enhancing diversity in sign language translation via diffusion model","authors":"JiHwan Moon , Jihoon Park , Jungeun Kim , Jongseong Bae , Hyeongwoo Jeon , Ha Young Kim","doi":"10.1016/j.patrec.2025.06.008","DOIUrl":"10.1016/j.patrec.2025.06.008","url":null,"abstract":"<div><div>Sign language translation (SLT) is challenging, as it involves converting sign language videos into natural language across the modalities. Previous studies have prioritized accuracy over diversity. However, diversity is crucial for handling lexical and syntactic ambiguities in machine translation, suggesting it could similarly benefit SLT. In this work, we propose DiffSLT, a gloss-free SLT framework that leverages the diffusion model, enabling diverse translations while preserving sign language semantics. DiffSLT transforms random noise into the target latent representation, conditioned on the visual features of input video. To enhance visual conditioning, we design Guidance Fusion Module, which integrates the multi-level spatiotemporal information of visual features. We also introduce DiffSLT-P, a DiffSLT variant that conditions on pseudo-glosses and visual features, providing key textual guidance and reducing the modality gap. As a result, DiffSLT and DiffSLT-P significantly improve diversity over prior gloss-free SLT methods and achieve state-of-the-art performance on the SLT datasets, markedly improving translation quality.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 117-125"},"PeriodicalIF":3.9,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144280439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Spatial context-based Self-Supervised Learning for Handwritten Text Recognition 基于空间上下文的手写文本识别自监督学习

IF 3.9 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-06-09 DOI: 10.1016/j.patrec.2025.05.014

Carlos Penarrubia, Carlos Garrido-Munoz, Jose J. Valero-Mas, Jorge Calvo-Zaragoza

引用次数: 0

GND-APR: Absolute pose regressor with graph neural diffusion for self-driving GND-APR：基于图神经扩散的自动驾驶绝对姿态回归器

IF 3.9 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-06-09 DOI: 10.1016/j.patrec.2025.05.009

Deben Lu , Wendong Xiao , Teng Ran , Liang Yuan , Jingchuan Wang

{"title":"GND-APR: Absolute pose regressor with graph neural diffusion for self-driving","authors":"Deben Lu , Wendong Xiao , Teng Ran , Liang Yuan , Jingchuan Wang","doi":"10.1016/j.patrec.2025.05.009","DOIUrl":"10.1016/j.patrec.2025.05.009","url":null,"abstract":"<div><div>Visual relocalization consists of estimating the camera translation and rotation in known scenarios. However, the lack of robust features occurs in pose regressors under self-driving environments containing perturbations (such as weather changes, seasons, illumination, and dynamic objects), leading to inaccurate poses. In this paper, we propose GND-APR to cope with the aforementioned issue via <strong>G</strong>raph dynamic attention <strong>N</strong>eural <strong>D</strong>iffusion and shared memory unit. Specifically, we propose graph dynamic attention neural diffusion for feature map and vector interaction, which graph dynamic attention facilitates better information interaction between feature correlations and enhances feature representation. Meanwhile, we enhance temporal feature fusion through a shared memory unit, it allows the network to record essential and robust features as an implicit defense against perturbations. Our model decreases the mean translation and rotation errors by 54% and 15%, respectively, on the 4Seasons dataset. Experiments on two challenging self-driving datasets demonstrate the superiority of our approach over other state-of-the-art methods.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 93-99"},"PeriodicalIF":3.9,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IRCNN+: An enhanced Iterative Residual Convolutional Neural Network for non-stationary signal decomposition IRCNN+：一种用于非平稳信号分解的增强迭代残差卷积神经网络

IF 3.3 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-06-07 DOI: 10.1016/j.patrec.2025.05.010

Feng Zhou , Antonio Cicone , Haomin Zhou , Linyan Gu

{"title":"IRCNN+: An enhanced Iterative Residual Convolutional Neural Network for non-stationary signal decomposition","authors":"Feng Zhou , Antonio Cicone , Haomin Zhou , Linyan Gu","doi":"10.1016/j.patrec.2025.05.010","DOIUrl":"10.1016/j.patrec.2025.05.010","url":null,"abstract":"<div><div>Time–frequency analysis is a crucial yet challenging task in numerous applications. The mainstream approach involves first decomposing non-stationary signals into quasi-stationary components to enhance the time–frequency feature clarity during analysis. Inspired by deep learning<span>, we proposed the Iterative Residual Convolutional Neural Network<span> (IRCNN) to address non-stationary signal decomposition. Deep learning enables IRCNN not only to achieve more stable decomposition results than existing methods but also to handle batch processing of large-scale signals with low computational costs. However, certain structural components of IRCNN remain relatively rudimentary, leading to limitations in local feature characterization, smoothness constraint, and adaptability requirement for non-stationary signals. This study aims to further refine IRCNN by integrating flexible techniques from deep learning<span><span> and optimization—such as multi-scale convolutional layers, attention mechanism, and </span>total variation denoising method—to enhance the method and overcome its inherent limitations.</span></span></span></div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 328-336"},"PeriodicalIF":3.3,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144931660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AdaBoost.SDM: Similarity and dissimilarity-based manifold regularized adaptive boosting algorithm 演算法。基于相似性和不相似性的流形正则化自适应增强算法

IF 3.9 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-06-07 DOI: 10.1016/j.patrec.2025.05.016

Azamat Mukhamediya, Amin Zollanvari

{"title":"AdaBoost.SDM: Similarity and dissimilarity-based manifold regularized adaptive boosting algorithm","authors":"Azamat Mukhamediya, Amin Zollanvari","doi":"10.1016/j.patrec.2025.05.016","DOIUrl":"10.1016/j.patrec.2025.05.016","url":null,"abstract":"<div><div>AdaBoost is a successful ensemble learning algorithm that generates a sequence of base learners, where each base learner is encouraged to focus more on those data points that are misclassified by the previous learner. That being said, AdaBoost, in its original form, lacks any mechanism to explicitly leverage the underlying geometric structure of data or manifold. Recent studies have shown that a training process that penalizes model outputs that do not align with the data manifold can lead to better generalization. In this paper, we aim to define a convex objective function for training AdaBoost that enforces a smooth variation of the model predictions over the data manifold. In this regard, we adopt a mixed-graph Laplacian that in contrast with the conventional Laplacian regularization can handle both label similarity and dissimilarity knowledge between data points. Compared with the original form of AdaBoost, the results demonstrate the effectiveness of the proposed similarity and dissimilarity-based manifold regularized AdaBoost (AdaBoost.SDM) in exploiting the data manifold and, at the same time, encoding the label similarity and dissimilarity to improve the classification performance. Our experimental results show that AdaBoost.SDM is highly competitive with state-of-the-art manifold regularized algorithms, including LapRLS and LapSVM.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 66-71"},"PeriodicalIF":3.9,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144253495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mobile phone screen defect detection based on improved YOLOv8n network 基于改进YOLOv8n网络的手机屏幕缺陷检测

IF 3.9 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-06-06 DOI: 10.1016/j.patrec.2025.05.011

Peng Shi , Xueqin Li , Zhiming Feng , Xingwang Shang

{"title":"Mobile phone screen defect detection based on improved YOLOv8n network","authors":"Peng Shi , Xueqin Li , Zhiming Feng , Xingwang Shang","doi":"10.1016/j.patrec.2025.05.011","DOIUrl":"10.1016/j.patrec.2025.05.011","url":null,"abstract":"<div><div>The continuous development of science and technology has led to mobile phones becoming an indispensable part of people’s lives. The screens of used mobile phones have significant recycling value. However, the three most prevalent types of defects affecting screens—oil, scratches and stains—have a significant impact on the efficacy of their recovery. In light of the suboptimal precision of conventional defect detection techniques, this paper proposes a mobile phone screen defect detection method based on the enhanced YOLOv8n network. Additionally, a mobile phone screen defect detection device has been devised to facilitate the detection process. The enhanced model incorporates the CBAM attention mechanism within the SPPF module with a view to enhancing the multi-scale feature fusion capability of the model. In addition, a small target detection layer has been added with a view to enhancing the detection accuracy of defects that are small in size and contrast. Furthermore, the BiFPN module is integrated to enhance the performance of the feature pyramid network, facilitate the capture of multi-scale information. Finally, the issue of the lack of dynamic focusing of the CIoU loss function is addressed by replacing it with the WIoU. The experimental findings demonstrate that, in comparison with YOLOv8n, YOLOv8n-SDBW exhibits a mAP@50 enhancement of 0.83% and a mAP@50:95 improvement of 2.09%, while maintaining a minimal increase in model parameters and complexity.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 72-78"},"PeriodicalIF":3.9,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144262374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Split-net: Dual transformer encoder with splitting scene text image for script identification Split-net：双变压器编码器，可拆分场景文本图像，用于脚本识别

IF 3.9 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-06-04 DOI: 10.1016/j.patrec.2025.05.026

Ayush Roy , Shivakumara Palaiahnakote , Umapada Pal , Cheng-Lin Liu

{"title":"Split-net: Dual transformer encoder with splitting scene text image for script identification","authors":"Ayush Roy , Shivakumara Palaiahnakote , Umapada Pal , Cheng-Lin Liu","doi":"10.1016/j.patrec.2025.05.026","DOIUrl":"10.1016/j.patrec.2025.05.026","url":null,"abstract":"<div><div>Script identification is vital for understanding scenes and video images. It is challenging due to high variations in physical appearance, typeface design, complex background, distortion, and significant overlap in the characteristics of different scripts. Unlike existing models, which aim to tackle the script images utilizing the scene text image as a whole, we propose to split the image into upper and lower halves to capture the intricate differences in stroke and style of various scripts. Motivated by the accomplishments of the transformer, a modified script-style-aware Mobile-Vision Transformer (M-ViT) is explored for encoding visual features of the images. To enrich the features of the transformer blocks, a novel Edge Enhanced Style Aware Channel Attention Module (EESA-CAM) has been integrated with M-ViT. Furthermore, the model fuses the features of the dual encoders (extracting features from the upper and the lower half of the images) by a dynamic weighted average procedure utilizing the gradient information of the encoders as the weights. In experiments on three standard datasets, MLe2e, CVSI2015, and SIW-13, the proposed model yielded superior performance compared to state-of-the-art models.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 100-108"},"PeriodicalIF":3.9,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144270287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Residual adaptive input normalization for forecasting renewable energy generation in multiple countries 预测多国可再生能源发电的剩余自适应投入归一化

IF 3.9 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-06-03 DOI: 10.1016/j.patrec.2025.05.008

Nikolaos Passalis, Christos N. Dimitriadis, Michael C. Georgiadis

{"title":"Residual adaptive input normalization for forecasting renewable energy generation in multiple countries","authors":"Nikolaos Passalis, Christos N. Dimitriadis, Michael C. Georgiadis","doi":"10.1016/j.patrec.2025.05.008","DOIUrl":"10.1016/j.patrec.2025.05.008","url":null,"abstract":"<div><div>Being able to accurately predict the generation of Renewable Energy Sources (RES), such as photovoltaic and wind generation, can provide valuable information for various stakeholders, including grid operator, energy producers, and consumers. Deep Learning (DL) provided powerful tools for this goal. However, it is not trivial to leverage data from multiple countries to improve forecasting accuracy due to distribution shift phenomena that are often involved. Adaptive normalization approaches have recently emerged as an effective tool for tackling such difficulties. These formulations, despite their very promising results in classification tasks, often face challenges in (auto)regressive tasks, especially when combined with recent powerful DL architectures. To overcome this limitation we introduce a novel residual-based adaptive normalization layer that is capable of re-introducing the information discarded during the normalization process to the forecasting model. The proposed method enables the use of data that are generated by different distributions but expresses the same phenomenon, allowing for exploiting additional data sources, e.g., multiple countries, when training DL models, leading to significant improvements in forecasting accuracy.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 52-58"},"PeriodicalIF":3.9,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144221833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A visual prompt learning network for hyperspectral object tracking 一种用于高光谱目标跟踪的视觉提示学习网络

IF 3.9 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-05-31 DOI: 10.1016/j.patrec.2025.05.006

Haijiao Xing , Wei Wei , Lei Zhang , Chen Ding

{"title":"A visual prompt learning network for hyperspectral object tracking","authors":"Haijiao Xing , Wei Wei , Lei Zhang , Chen Ding","doi":"10.1016/j.patrec.2025.05.006","DOIUrl":"10.1016/j.patrec.2025.05.006","url":null,"abstract":"<div><div>Hyperspectral object tracking aims to achieve continuous tracking and localization of targets in a series of Hyperspectral images (HSIs) by analyzing and comparing the spectral and spatial features of the targets. Due to the relatively small size of hyperspectral object tracking datasets, existing strategies mainly rely on fine-tuning models initially trained on RGB images and then adapted them to hyperspectral data. However, the transferability of this comprehensive fine-tuning strategy is limited by the deficiencies in the data, resulting in suboptimal performance and limited results in hyperspectral object tracking. To address these challenges, we propose a visual prompt learning network for hyperspectral object tracking (VPH). In this approach, we freeze all the parameters of the model trained on RGB images and introduce a hyperspectral prompt module to efficiently transfer data-related information within HSIs to the RGB modality at a lower computational cost. In addition, we introduce an adapter module to adjust the frozen parameters of the RGB branch, ensuring fast adaptation to the hyperspectral tracking task. Our proposed network achieves the best performance in benchmark tests, validating the effectiveness of the proposed method. Our code and additional results are available at: <span><span>https://github.com/972821054/VPH.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 59-65"},"PeriodicalIF":3.9,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144221834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Underwater image color correction via global and local two-step optimization 基于全局和局部两步优化的水下图像色彩校正

IF 3.9 3区计算机科学

Pattern Recognition Letters Pub Date : 2025-05-31 DOI: 10.1016/j.patrec.2025.05.007

Baiqiang Yu , Ling Zhou , Wenqiang Yu , Peixian Zhuang , Weidong Zhang

{"title":"Underwater image color correction via global and local two-step optimization","authors":"Baiqiang Yu , Ling Zhou , Wenqiang Yu , Peixian Zhuang , Weidong Zhang","doi":"10.1016/j.patrec.2025.05.007","DOIUrl":"10.1016/j.patrec.2025.05.007","url":null,"abstract":"<div><div>Underwater images usually suffer from quality degradation due to light absorption and scattering, leading to color distortion, blurred details, and low contrast. To address these challenges, we propose a global and local two-step optimization method (GLTO). Specifically, we first analyze the statistical features of natural images in the CIELab color space. Meanwhile, we design a heuristic global optimization strategy that minimizes the feature differences between underwater and natural images to restore the color and luminance of the raw image. We develop a local optimization strategy based on luminance information, which uses guided filtering to decompose the luminance channel into large-scale and small-scale high-frequency images and weighted fusion of them to obtain a detail-enhanced luminance channel. Finally, we leverage the local illumination intensity of the image captured by the luminance channel to adjust the local color distortion. Extensive experimental evaluations have demonstrated the superiority of our proposed GLTO method in underwater image preprocessing, which substantially enhances the performance of subsequent image enhancement. The project can be found at <span><span>https://github.com/yubaiqiang/GLTO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 38-44"},"PeriodicalIF":3.9,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144203050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0