Pattern Recognition Letters最新文献

筛选
英文 中文
RobustPrompt: Learning to defend against adversarial attacks with adaptive visual prompts
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-02-21 DOI: 10.1016/j.patrec.2025.02.015
Chang Liu , Wenzhao Xiang , Yinpeng Dong , Xingxing Zhang , Liyuan Wang , Ranjie Duan , Shibao Zheng , Hang Su
{"title":"RobustPrompt: Learning to defend against adversarial attacks with adaptive visual prompts","authors":"Chang Liu ,&nbsp;Wenzhao Xiang ,&nbsp;Yinpeng Dong ,&nbsp;Xingxing Zhang ,&nbsp;Liyuan Wang ,&nbsp;Ranjie Duan ,&nbsp;Shibao Zheng ,&nbsp;Hang Su","doi":"10.1016/j.patrec.2025.02.015","DOIUrl":"10.1016/j.patrec.2025.02.015","url":null,"abstract":"<div><div>Adversarial training stands out as one of the most effective techniques for enhancing robustness by enriching the training data with adversarial examples. Nonetheless, when faced with various perturbation budgets, the model’s performance can suffer notable degradation. This occurs because different perturbations induce distinct distribution shifts in adversarial examples. In order to enhance performance on specific perturbations, fine-tuning is commonly employed. However, this approach can lead to catastrophic forgetting, where improvements on specific tasks come at the cost of degrading performance on previously learned ones. We frame this challenge as an incremental domain learning problem in continual learning. Inspired by the application of prompt techniques in vision models, we introduce RobustPrompt, which integrates additional guidance information regarding perturbation characteristics into the adversarial training process. This approach enables the model to adaptively enhance its robustness under varying budget perturbations. Specifically, we define an adaptive prompt pool composed of a noise level predictor and corresponding prompts for different perturbations. During training, prompts are injected into different layers of the model, thereby guiding the model to focus on correct features. Experiments demonstrate that RobustPrompt enhances the adversarial robustness of the state-of-the-art Swin Transformer Base model, achieving an average improvement of 61.1% against PGD attacks and 56.9% against AutoAttack across five white-box settings; an average improvement of 76.1% against VMI-FGSM attacks across five black-box settings; an average improvement of 53.7% on five datasets with natural noise. Our results underscore the potential of RobustPrompt as a useful tool for enhancing the reliability and robustness of transformers in image classification tasks.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 161-168"},"PeriodicalIF":3.9,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143511224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-grained cross-modality consistency mining for Continuous Sign Language Recognition
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-02-21 DOI: 10.1016/j.patrec.2025.02.017
Zhenghao Ke, Sheng Liu, Yuan Feng
{"title":"Fine-grained cross-modality consistency mining for Continuous Sign Language Recognition","authors":"Zhenghao Ke,&nbsp;Sheng Liu,&nbsp;Yuan Feng","doi":"10.1016/j.patrec.2025.02.017","DOIUrl":"10.1016/j.patrec.2025.02.017","url":null,"abstract":"<div><div>Continuous Sign Language Recognition (CSLR) involves the detection of sequential glosses from visual sign inputs. While current CSLR methods perform well with high-frequency glosses – primarily functors such as conjunctions and pronouns – they struggle to accurately recognize low-frequency content words, which are essential for conveying meaningful information. This challenge arises from limitations in existing datasets and the Connectionist Temporal Classification (CTC) training procedure, leading to poor generalization to diverse linguistic structures. As a result, CSLR systems face limited applicability in real-world scenarios. In this work, we introduce the Fine-Grained Cross-modality Consistency (FGXM) loss, a novel approach designed to align visual and linguistic models. The FGXM loss encourages consistency between visual and language representations, improving the model’s ability to integrate visual context with linguistic understanding. We also propose the unweighted word error rate (uWER), an unbiased metric for CSLR performance. Unlike the conventional word error rate (WER), uWER provides a fairer evaluation by addressing the frequency imbalance between content words and functors, offering a more accurate measure of a model’s real-world effectiveness. We extensively evaluate our approach across multiple datasets and models, demonstrating significant improvements in both accuracy and data efficiency.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"191 ","pages":"Pages 23-30"},"PeriodicalIF":3.9,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StableID: Multimodal learning for stable identity in personalized Text-to-Face generation
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-02-21 DOI: 10.1016/j.patrec.2025.02.018
Xueping Wang, Yixuan Gao, Yanan Liu, Feihu Yan, Guangzhe Zhao
{"title":"StableID: Multimodal learning for stable identity in personalized Text-to-Face generation","authors":"Xueping Wang,&nbsp;Yixuan Gao,&nbsp;Yanan Liu,&nbsp;Feihu Yan,&nbsp;Guangzhe Zhao","doi":"10.1016/j.patrec.2025.02.018","DOIUrl":"10.1016/j.patrec.2025.02.018","url":null,"abstract":"<div><div>Personalized Text-To-Face (TTF) generation aims to inject new subjects (e.g., identity information) into the text-to-image diffusion model, generating images that align with text prompts and maintain subject consistency in different contexts. Currently, some methods usually overfit the reference images on text prompts related to facial attributes, or ignore facial details to fit the text prompts, thus weakening identity consistency. To address these issues, we propose a personalized TTF method for generating <strong>Stable ID</strong>entities without fine-tuning, named StableID. Firstly, multimodal-guided identity constraint is proposed to ensure stability of identity features and preservation of face details, along with semantic editing capabilities. Secondly, we design residual cross-attention based mask balancing loss that effectively separates identity information from non-identity related backgrounds, balancing the effects of text prompts and identity constraints. Furthermore, we develop a portrait dataset with detailed facial prompt, as well as decoupled editable attribute vectors, enabling smooth and precise control over fine-grained semantic edits. Extensive experimental results show that our method outperforms the state-of-the-arts in stable identity consistency.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 153-160"},"PeriodicalIF":3.9,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143509123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ConsistentDreamer: View-consistent meshes through balanced multi-view Gaussian optimization
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-02-21 DOI: 10.1016/j.patrec.2025.02.016
Onat Şahin , Mohammad Altillawi , George Eskandar , Carlos Carbone , Ziyuan Liu
{"title":"ConsistentDreamer: View-consistent meshes through balanced multi-view Gaussian optimization","authors":"Onat Şahin ,&nbsp;Mohammad Altillawi ,&nbsp;George Eskandar ,&nbsp;Carlos Carbone ,&nbsp;Ziyuan Liu","doi":"10.1016/j.patrec.2025.02.016","DOIUrl":"10.1016/j.patrec.2025.02.016","url":null,"abstract":"<div><div>Recent advances in diffusion models have significantly improved 3D generation, enabling the use of assets generated from an image for embodied AI simulations. However, the one-to-many nature of the image-to-3D problem limits their use due to inconsistent content and quality across views. Previous models optimize a 3D model by sampling views from a view-conditioned diffusion prior, but diffusion models cannot guarantee view consistency. Instead, we present ConsistentDreamer, where we first generate a set of fixed multi-view prior images and sample random views between them with another diffusion model through a score distillation sampling (SDS) loss. Thereby, we limit the discrepancies between the views guided by the SDS loss and ensure a consistent rough shape. In each iteration, we also use our generated multi-view prior images for fine-detail reconstruction. To balance between the rough shape and the fine-detail optimizations, we introduce dynamic task-dependent weights based on homoscedastic uncertainty, updated automatically in each iteration. Additionally, we employ opacity, depth distortion, and normal alignment losses to refine the surface for mesh extraction. Our method ensures better view consistency and visual quality compared to the state-of-the-art.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 118-125"},"PeriodicalIF":3.9,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fractional concepts in neural networks: Enhancing activation functions
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-02-19 DOI: 10.1016/j.patrec.2025.02.013
Vojtech Molek, Zahra Alijani
{"title":"Fractional concepts in neural networks: Enhancing activation functions","authors":"Vojtech Molek,&nbsp;Zahra Alijani","doi":"10.1016/j.patrec.2025.02.013","DOIUrl":"10.1016/j.patrec.2025.02.013","url":null,"abstract":"<div><div>This study explores the integration of fractional calculus in neural networks by introducing fractional order derivatives (FOD) as tunable parameters in activation functions, enabling diverse function adaptation. We evaluate these fractional activation functions across datasets and architectures, comparing them with traditional and novel functions to assess their effects on accuracy, computational efficiency, and memory usage. Findings indicate that fractional functions, especially fractional Sigmoid, can yield better performance, though challenges persist.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 126-132"},"PeriodicalIF":3.9,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VHOIP: Video-based Human–Object Interaction recognition with CLIP Prior knowledge
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-02-19 DOI: 10.1016/j.patrec.2025.02.014
Doyeol Baek , Junsuk Choe
{"title":"VHOIP: Video-based Human–Object Interaction recognition with CLIP Prior knowledge","authors":"Doyeol Baek ,&nbsp;Junsuk Choe","doi":"10.1016/j.patrec.2025.02.014","DOIUrl":"10.1016/j.patrec.2025.02.014","url":null,"abstract":"<div><div>In this paper, we introduce a novel approach to recognizing Human–Object Interactions (HOI) in videos, crucial for understanding videos focused on human activities. Traditional methods often fall short of accurately identifying subtle interactions, particularly in dynamic sequences involving multiple individuals and objects. To address these issues, we leverage the CLIP (Contrastive Language–Image Pre-training), renowned for its rich visual and linguistic knowledge. Our method, Video-based HOI recognition with CLIP Prior knowledge (VHOIP), merges the spatial and temporal analysis capabilities of a video-based HOI framework with the detailed interaction understanding from CLIP. This enhancement significantly advances our HOI recognition performances. Through rigorous validation of three different HOI recognition datasets, our method demonstrates remarkable improvements over current state-of-the-art techniques, both qualitatively and quantitatively, indicating the effectiveness of our approach.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 133-140"},"PeriodicalIF":3.9,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pre-training meets iteration: Learning for robust 3D point cloud denoising
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-02-18 DOI: 10.1016/j.patrec.2025.02.012
Siwen Quan , Hebin Zhao , Zhao Zeng , Ziming Nie , Jiaqi Yang
{"title":"Pre-training meets iteration: Learning for robust 3D point cloud denoising","authors":"Siwen Quan ,&nbsp;Hebin Zhao ,&nbsp;Zhao Zeng ,&nbsp;Ziming Nie ,&nbsp;Jiaqi Yang","doi":"10.1016/j.patrec.2025.02.012","DOIUrl":"10.1016/j.patrec.2025.02.012","url":null,"abstract":"<div><div>Point cloud denoising is a crucial task in remote sensing and 3D computer vision, which has a significant impact on downstream tasks based on high-quality point clouds. Currently, although deep-learning-based point cloud denoising methods have demonstrated outstanding performance, their cross-dataset performance and the robustness to high-level noise remain limited. In this letter, we propose a framework called pre-training meets iteration (PMI). It presents a novel perspective that leverages point cloud pre-training for feature encoding under an iterative learning framework for point cloud denoising. Our framework exhibits robust feature encoding capabilities with pre-training. The iterative denoising architecture progressively refine data through multiple iterations to reduce noise at various levels. Under the PMI framework, we further propose a method called PMI-MAE-IT based on point masked auto-encoder and iterative neural network. The experimental results have demonstrated the outstanding cross-dataset performance of our method. Specifically, compared with state-of-the-art denoising networks, our method achieves competitive performance on the PUNet dataset, and the best performance when tested on the unseen Kinect dataset.The source code can be found at: <span><span>https://github.com/hb-zhao/PMI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 105-110"},"PeriodicalIF":3.9,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic frequency window transformer for single image deraining
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-02-18 DOI: 10.1016/j.patrec.2025.02.009
Pengcheng Wang, Yuli Fu, Youjun Xiang, Yufeng Tan
{"title":"Dynamic frequency window transformer for single image deraining","authors":"Pengcheng Wang,&nbsp;Yuli Fu,&nbsp;Youjun Xiang,&nbsp;Yufeng Tan","doi":"10.1016/j.patrec.2025.02.009","DOIUrl":"10.1016/j.patrec.2025.02.009","url":null,"abstract":"<div><div>Outdoor visual systems often encounter rain streaks leading to the degradation of images that affects later high-level tasks. In contrast to methods that currently perform feature extraction in the spatial domain, this paper proposes a Dynamic Frequency Window Transformer Network(DFWT-Net) by integrating the frequency and spatial domain information of images. Firstly, the Space Frequency Module (SFM) is utilized to extract local features and preliminary frequency characteristics of the image, including amplitude and phase. Subsequently, the residuals of the rain streaks are obtained by a hierarchical encoder–decoder network consisting of dynamic frequency window transformers (DFWTs). The Dynamic Frequency Window Transformer (DFWT) employs learnable masks for dynamic frequency filtering before self-attention computation to emphasize the frequency of rain streaks. To obtain rain-free images that resemble real scenes, this paper propose the Cosine Phase Loss (CPL) function to measure the phase similarity between the recovered image and the ground truth image. The experimental results thoroughly validate the effectiveness and robustness of the proposed network.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 111-117"},"PeriodicalIF":3.9,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143480185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Normalized graph compression distance – A novel graph matching framework
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-02-18 DOI: 10.1016/j.patrec.2025.02.011
Anthony Gillioz, Kaspar Riesen
{"title":"Normalized graph compression distance – A novel graph matching framework","authors":"Anthony Gillioz,&nbsp;Kaspar Riesen","doi":"10.1016/j.patrec.2025.02.011","DOIUrl":"10.1016/j.patrec.2025.02.011","url":null,"abstract":"<div><div>Computing dissimilarities between pairs of graphs is a common task in many pattern recognition applications. A widely used method to accomplish this task is graph edit distance (GED). However, computation of exact GED is challenging due to its exponential time complexity with respect to the size of the underlying graphs. The major contribution of the present paper is that we introduce a complementary – and much faster – method to compute dissimilarities between pairs of graphs. Our novel framework involves a compressor-based metric that is adapted to the graph domain. Basically, the compressor-based metric identifies regularities in compressed graphs and assigns smaller distances to pairs of graphs that are comparable and are thus assumed to belong to the same class. To assess the effectiveness of the proposed graph matching framework, we perform a series of evaluations on eleven real-world datasets. It turns out that the novel matching framework performs equally well as, or even better than, GED, yet with significantly lower computation time.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 97-104"},"PeriodicalIF":3.9,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143453018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal word order for non-causal text generation with Large Language Models: The Spanish case
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-02-17 DOI: 10.1016/j.patrec.2025.02.010
Andrea Busto-Castiñeira, Silvia García-Méndez, Francisco de Arriba-Pérez, Francisco J. González-Castaño
{"title":"Optimal word order for non-causal text generation with Large Language Models: The Spanish case","authors":"Andrea Busto-Castiñeira,&nbsp;Silvia García-Méndez,&nbsp;Francisco de Arriba-Pérez,&nbsp;Francisco J. González-Castaño","doi":"10.1016/j.patrec.2025.02.010","DOIUrl":"10.1016/j.patrec.2025.02.010","url":null,"abstract":"<div><div>Natural Language Generation (<span>nlg</span>) popularity has increased owing to the progress in Large Language Models (<span>llm</span>s), with zero-shot inference capabilities. However, most neural systems utilize decoder-only causal (unidirectional) transformer models, which are effective for English but may reduce the richness of languages with less strict word order, subject omission, or different relative clause attachment preferences. This is the first work that analytically addresses optimal text generation order for non-causal language models. We present a novel Viterbi algorithm-based methodology for maximum likelihood word order estimation. We analyze the non-causal most-likelihood order probability for <span>nlg</span> in Spanish and, then, the probability of generating the same phrases with Spanish causal <span>nlg</span>. This comparative analysis reveals that causal <span>nlg</span> prefers English-like <span>svo</span> structures. We also analyze the relationship between optimal generation order and causal left-to-right generation order using Spearman’s rank correlation. Our results demonstrate that the ideal order predicted by the maximum likelihood estimator is not closely related to the causal order and may be influenced by the syntactic structure of the target sentence.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"190 ","pages":"Pages 89-96"},"PeriodicalIF":3.9,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143437601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信