Pattern Recognition Letters最新文献

筛选
英文 中文
Prompt-based Weakly-supervised Vision-language Pre-training 基于提示的弱监督视觉语言预训练
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-07-09 DOI: 10.1016/j.patrec.2025.06.020
Zixin Guo , Tzu-Jui Julius Wang , Selen Pehlivan , Abduljalil Radman , Min Cao , Jorma Laaksonen
{"title":"Prompt-based Weakly-supervised Vision-language Pre-training","authors":"Zixin Guo ,&nbsp;Tzu-Jui Julius Wang ,&nbsp;Selen Pehlivan ,&nbsp;Abduljalil Radman ,&nbsp;Min Cao ,&nbsp;Jorma Laaksonen","doi":"10.1016/j.patrec.2025.06.020","DOIUrl":"10.1016/j.patrec.2025.06.020","url":null,"abstract":"<div><div>Weakly-supervised Vision-Language Pre-training (W-VLP) explores methods leveraging weak cross-modal supervision, typically relying on object tags generated by a pre-trained object detector (OD) from images. However, training such an OD necessitates dense cross-modal information, including images paired with numerous object-level annotations. To alleviate that requirement, this paper addresses W-VLP in two stages: (1) creating data with weaker cross-modal supervision and (2) pre-training a vision-language (VL) model with the created data. The data creation process involves collecting knowledge from large language models (LLMs) to describe images. Given a category label of an image, its descriptions generated by an LLM are used as the language counterpart. This knowledge supplements what can be obtained using an OD, such as spatial relationships among objects most likely appearing in a scene. To mitigate the noise in the LLM-generated descriptions that destabilizes the training process and may lead to overfitting, we incorporate knowledge distillation and external retrieval-augmented knowledge during pre-training. Furthermore, we present an effective VL model pre-trained with the created data. Empirically, despite its weaker cross-modal supervision, our pre-trained VL model notably outperforms other W-VLP works in image and text retrieval tasks, e.g., VLMixer by 17.7% on MSCOCO and RELIT by 11.25% on Flickr30K relatively in Recall@1 in text-to-image retrieval task. It also shows superior performance on other VL downstream tasks, making a big stride towards matching the performances of strongly supervised VLP models. The results reveal the effectiveness of the proposed W-VLP methodology.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 8-15"},"PeriodicalIF":3.9,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144653943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trusty Visual Intelligence Model for Leather Defect Detection Using ConvNeXtBase and Coyote Optimized Extra Tree 基于ConvNeXtBase和Coyote优化额外树的可靠视觉智能皮革缺陷检测模型
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-07-07 DOI: 10.1016/j.patrec.2025.06.019
Brij B. Gupta , Akshat Gaurav , Razaz Waheeb Attar , Varsha Arya , Ahmed Alhomoud
{"title":"Trusty Visual Intelligence Model for Leather Defect Detection Using ConvNeXtBase and Coyote Optimized Extra Tree","authors":"Brij B. Gupta ,&nbsp;Akshat Gaurav ,&nbsp;Razaz Waheeb Attar ,&nbsp;Varsha Arya ,&nbsp;Ahmed Alhomoud","doi":"10.1016/j.patrec.2025.06.019","DOIUrl":"10.1016/j.patrec.2025.06.019","url":null,"abstract":"<div><div>The leather industry continuously strives to ensure high product quality, yet defects often arise during stages like tanning, dyeing, and material handling. Traditional manual inspections are inconsistent, creating a need for automated, reliable visual intelligence systems. This paper introduces a Trusty Visual Intelligence Model for Leather Defect Detection Using ConvNeXtBase and Coyote Optimized Extra Tree. ConvNeXtBase is utilized for feature extraction, while an ExtraTreesClassifier, optimized with the Coyote Optimization Algorithm (COA), is employed for accurate defect classification, identifying issues like grain off, loose grains, and pinholes. Comparative analysis with models such as SVM, XGBoost, and LGBMClassifier demonstrates superior accuracy (0.90), precision, recall, and F1 score. The COA-optimized ExtraTreesClassifier is efficient and effective, making it ideal for real-time industrial applications.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 312-318"},"PeriodicalIF":3.9,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144580384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic canine emotion recognition through multimodal approach 基于多模态方法的犬类情绪自动识别
IF 3.3 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-07-05 DOI: 10.1016/j.patrec.2025.06.018
Eliaf Garcia-Loya , Irvin Hussein Lopez-Nava , Humberto Pérez-Espinosa , Veronica Reyes-Meza , Mariel Urbina-Escalante
{"title":"Automatic canine emotion recognition through multimodal approach","authors":"Eliaf Garcia-Loya ,&nbsp;Irvin Hussein Lopez-Nava ,&nbsp;Humberto Pérez-Espinosa ,&nbsp;Veronica Reyes-Meza ,&nbsp;Mariel Urbina-Escalante","doi":"10.1016/j.patrec.2025.06.018","DOIUrl":"10.1016/j.patrec.2025.06.018","url":null,"abstract":"<div><div>This study introduces a comprehensive multimodal approach for analyzing and classifying emotions in dogs, combining visual, inertial, and physiological data to improve emotion recognition performance. The research focuses on the dimensions of valence and arousal to categorize dog emotions into four quadrants: playing, frustration, abandonment, and petting. A custom-developed device (PATITA) was used for synchronized data collection to which a feature extraction process based on windowing was done. Dimensionality reduction and feature selection techniques were applied to identified most relevant features across data types. Then, several unimodal and multimodal classification models, including Naïve Bayes, SVM, ExtraTrees, and kNN, were trained and evaluated. Experimental results demonstrated the superiority of the multimodal approach, with ExtraTrees classifier consistently yielding the best results (F1-score = 0.96), using the reduced feature set. In conclusion, this work presents a robust multimodal framework for canine emotion recognition, providing a foundation for future studies to refine techniques and overcome current limitations, particularly through more sophisticated models and expanded data collection.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 351-357"},"PeriodicalIF":3.3,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144931663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-branch scale disentanglement for text–video retrieval 文本视频检索的双分支尺度解纠缠
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-07-03 DOI: 10.1016/j.patrec.2025.06.014
Hyunjoon Koo , Jungkyoo Shin , Eunwoo Kim
{"title":"Dual-branch scale disentanglement for text–video retrieval","authors":"Hyunjoon Koo ,&nbsp;Jungkyoo Shin ,&nbsp;Eunwoo Kim","doi":"10.1016/j.patrec.2025.06.014","DOIUrl":"10.1016/j.patrec.2025.06.014","url":null,"abstract":"<div><div>In multi-modal understanding, text–video retrieval task, which aims to align videos with the corresponding texts, has gained increasing attention. Previous studies involved aligning fine-grained and coarse-grained features of videos and texts using a single model framework. However, the inherent differences between local and global features may result in entangled representations, leading to sub-optimal results. To address this issue, we introduce an approach to disentangle distinct modality features. Using a dual-branch structure, our method projects local and global features into distinct latent spaces. Each branch employs a different neural network and a loss function, facilitating independent learning of each feature and effectively capturing detailed and comprehensive features. We demonstrate the effectiveness of our method for text–video retrieval task across three different benchmarks, showing improvements over existing methods. It outperforms the compared methods by an average of +1.0%, +0.9%, and +0.6% in R@1 on MSR-VTT, LSMDC and MSVD, respectively</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 296-302"},"PeriodicalIF":3.9,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-domain 3D model classification via pseudo-labeling noise correction 基于伪标记噪声校正的跨域三维模型分类
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-07-03 DOI: 10.1016/j.patrec.2025.06.013
Tong Zhou, Mofei Song
{"title":"Cross-domain 3D model classification via pseudo-labeling noise correction","authors":"Tong Zhou,&nbsp;Mofei Song","doi":"10.1016/j.patrec.2025.06.013","DOIUrl":"10.1016/j.patrec.2025.06.013","url":null,"abstract":"<div><div>Unsupervised domain adaptation (UDA) with pseudo-labeling has become a key approach for cross-domain 3D model classification. Although it effectively narrows the gap between domains, the performance of existing UDA methods will drop significantly when applied to multi-category and multi-scene 3D model classification due to the dependence on 3D source domain labels and the impact of low-quality pseudo-labels. In this paper, we address this challenge by proposing an innovative cross-domain 3D model classification framework based on 2D–3D UDA and pseudo-label correction mechanism. Our method fully utilizes the rich semantic labels and scene information in the image domain for efficient image-to-3D cross-domain adaptation, completely eliminating the dependence on 3D labels. In addition, we introduce sufficient prior knowledge in the image domain to guide the adversarial training of the pseudo-label correction module. The introduction of cross-modal information improves the quality of pseudo-labels in cross-domain 3D classification, breaking the limitation of existing label denoising mechanisms that are limited to a single modality. Experimental results on multiple standard 3D model datasets and cross-domain generalization tasks show that this method outperforms existing mainstream 3D UDA methods in terms of robustness and classification performance, verifying its practicality and generalization ability without relying on 3D data annotation.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 303-311"},"PeriodicalIF":3.9,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144580383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ALAT: Adversarial Label-guided Adversarial Training ALAT:对抗性标签引导对抗性训练
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-07-01 DOI: 10.1016/j.patrec.2025.06.012
Nan Wang , Yong Yu , Honghong Wang
{"title":"ALAT: Adversarial Label-guided Adversarial Training","authors":"Nan Wang ,&nbsp;Yong Yu ,&nbsp;Honghong Wang","doi":"10.1016/j.patrec.2025.06.012","DOIUrl":"10.1016/j.patrec.2025.06.012","url":null,"abstract":"<div><div>Adversarial training is a widely used defense method in deep neural networks that enhances the model’s ability to detect perturbations and improve network robustness. Previous studies have assessed adversarial attack performance from various angles, leading to enhancements in adversarial training to bolster network robustness. However, while some research has explored the effectiveness of misclassified data in improving adversarial training, they have ignored the importance of the adversarial predicted labels. We observe that adversarial sample prediction labels often correspond to high probability categories in natural predictions. This paper proposes a new adversarial training method called <strong>Adversarial Label-guided Adversarial Training (ALAT)</strong>. This method incorporates an additional regularization term that integrates adversarial prediction labels into the training process, guiding predictions closer to true labels and away from adversarial labels. Extensive experiments confirm its effectiveness.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 250-256"},"PeriodicalIF":3.9,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144524024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EPRVFL: A fast and scalable model for real-time fake news detection EPRVFL:一种快速、可扩展的实时假新闻检测模型
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-06-30 DOI: 10.1016/j.patrec.2025.06.006
Rajiv Kumar Gurjwar , Alok Kumar , Udai Pratap Rao
{"title":"EPRVFL: A fast and scalable model for real-time fake news detection","authors":"Rajiv Kumar Gurjwar ,&nbsp;Alok Kumar ,&nbsp;Udai Pratap Rao","doi":"10.1016/j.patrec.2025.06.006","DOIUrl":"10.1016/j.patrec.2025.06.006","url":null,"abstract":"<div><div>The widespread dissemination of fake news on social media platforms emphasizes the need for efficient detection methods. In this research, we propose the Embedding Privileged Random Vector Functional Link (EPRVFL) model to improve the real-time detection of fake news while minimizing the inference time. In the proposed EPRVFL model, the text data with diverse datasets, PolitiFact, LIAR2, and BuzzFeed-Webis, were preprocessed and tokenized for analysis to ensure adaptability in various scenarios. The proposed model employs a shallow neural network with a connection between input and output layers to efficiently classify and integrate experimentally finalized bidirectional encoder representations from transformers (BERT) embeddings.</div><div>The proposed model achieves inference times of 0.0011 s, 0.0208 s, and 0.0053 s while maintaining high accuracies of 91.7722%, 74.6516%, and 70.3703% on PolitiFact, LIAR2, and BuzzFeed-Webis, respectively. Compared to CNN (1.3395 s, 72.0819%) and BiGRU (0.5518 s, 73.9547%), the EPRVFL ensures significantly faster inference with competitive accuracy. While BiLSTM achieves a higher precision (98.3471%) on PolitiFact, it requires 0.7674 s, making it less efficient in real-time scenarios. Similarly, FFNN shows the fastest inference (0.1103 s) but struggles with accuracy (59.4595%) on BuzzFeed-Webis. The proposed model’s balanced performance across precision, recall and F1 scores reinforces its robustness in fake news detection. The proposed EPRVFL model uniquely integrates BERT-base embeddings with a lightweight neural structure, ensuring rapid inference while maintaining robust accuracy, making it ideal for real-time applications. These findings provide analytical evidence for the model’s applicability in large-scale scenarios and the potential for future research by incorporating enhanced context analysis.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 267-273"},"PeriodicalIF":3.9,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144524294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text optimization with latent inversion for non-rigid image editing 文本优化与潜在反演非刚性图像编辑
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-06-28 DOI: 10.1016/j.patrec.2025.06.011
Yunji Jung , Seokju Lee , Tair Djanibekov , Jong Chul Ye , Hyunjung Shim
{"title":"Text optimization with latent inversion for non-rigid image editing","authors":"Yunji Jung ,&nbsp;Seokju Lee ,&nbsp;Tair Djanibekov ,&nbsp;Jong Chul Ye ,&nbsp;Hyunjung Shim","doi":"10.1016/j.patrec.2025.06.011","DOIUrl":"10.1016/j.patrec.2025.06.011","url":null,"abstract":"<div><div>Text-guided non-rigid image editing involves complex edits for input images, such as changing motion or compositions of the object (e.g., making a horse jump or adding candles on a cake). Since it requires manipulating the structure of the object, existing methods often compromise “image identity”– defined as the overall object appearance and background details – particularly when combined with Stable Diffusion. In this work, we propose a new approach for non-rigid image editing with Stable Diffusion, aimed at improving the image identity preservation quality without compromising editability. Our approach comprises three stages: text optimization, latent inversion, and timestep-aware text injection sampling. Inspired by the success of Imagic, we employ their text optimization for smooth editing. Then, we introduce latent inversion to preserve the input image’s identity without additional model fine-tuning. To fully utilize the input reconstruction ability of latent inversion, we employ timestep-aware text injection sampling, strategically injecting the source text prompt in early sampling steps and then transitioning to the target prompt in subsequent sampling steps. This strategic approach seamlessly harmonizes with text optimization, facilitating complex non-rigid edits to the input without losing the original identity. We demonstrate the effectiveness of our method in terms of identity preservation, editability, and aesthetic quality through extensive experiments. Our code is available at <span><span>https://github.com/YunjiJung0105/TOLI-non-rigid-editing</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 281-288"},"PeriodicalIF":3.9,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144534264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel industrial thermoelectric cooler component defect vision transformer detector based on local and global features fusion 基于局部和全局特征融合的新型工业热电冷却器部件缺陷视觉变压器检测方法
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-06-27 DOI: 10.1016/j.patrec.2025.06.022
Jie Tu , Mengjie Tang , Yong Han , Daren Wei , Kelvin K.L. Wong
{"title":"A novel industrial thermoelectric cooler component defect vision transformer detector based on local and global features fusion","authors":"Jie Tu ,&nbsp;Mengjie Tang ,&nbsp;Yong Han ,&nbsp;Daren Wei ,&nbsp;Kelvin K.L. Wong","doi":"10.1016/j.patrec.2025.06.022","DOIUrl":"10.1016/j.patrec.2025.06.022","url":null,"abstract":"<div><div>Thermoelectric coolers (TECs) are crucial in industries requiring precise temperature control, such as electronics, telecommunications, aerospace, and semiconductor manufacturing. During the manufacturing process of TEC components, defects including cracks, pits, and contamination frequently occur, compromising performance and service life. Traditional manual inspection methods are inefficient and error-prone, motivating the need for an automated and accurate defect detection approach. To address these challenges posed by the subtle, diverse, and randomly distributed defects on TEC components, we propose the Local Feature Enhance and Feature Fusion Network (LFEFFN), a hybrid model integrating convolutional neural networks (CNNs) and Transformer architectures to simultaneously capture local details and global contextual information. Specifically, the model enhances the traditional patch embedding module using affine transformations and overlapping convolutional layers, incorporates a Local Feature Extraction Module (LFEM) based on depthwise separable convolutions, and employs a Global-to-Local Feature Fusion Module (GLFM) to effectively merge features. Extensive experiments were conducted on a custom TEC dataset of 4800 images representing seven defect states, employing stratified sampling for training, validation, and testing. Cross-domain validation was also performed using the publicly available DAGM 2007 dataset. The LFEFFN achieved a Top-1 accuracy of 94.73 % and a macro-average F1 score of 0.934, outperforming state-of-the-art CNN-based and Transformer-based models. Robustness evaluations under varied lighting (±50 %), rotation (±30°), and resolution changes (50 % and 150 %) demonstrated minimal performance degradation, confirming the model's resilience in complex industrial environments. Cross-domain testing on the DAGM 2007 dataset yielded a Top-1 accuracy of 85.62 %, highlighting the model's strong generalization ability. Ablation studies further validated the contributions of each module and parameter configuration, and deployment analysis showed an average inference time of 0.05 s per image, satisfying real-time industrial application requirements.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 257-266"},"PeriodicalIF":3.9,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144524034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coarse to fine image matching by mining matchable regions and geometric cues 通过挖掘匹配区域和几何线索进行粗到细的图像匹配
IF 3.9 3区 计算机科学
Pattern Recognition Letters Pub Date : 2025-06-26 DOI: 10.1016/j.patrec.2025.06.009
Qingqun Kong , Zhili Qiu , Yiming Zheng , Kehu Yang , Bin Fan
{"title":"Coarse to fine image matching by mining matchable regions and geometric cues","authors":"Qingqun Kong ,&nbsp;Zhili Qiu ,&nbsp;Yiming Zheng ,&nbsp;Kehu Yang ,&nbsp;Bin Fan","doi":"10.1016/j.patrec.2025.06.009","DOIUrl":"10.1016/j.patrec.2025.06.009","url":null,"abstract":"<div><div>Detector-free image matchers have shown promising results in handling challenging cases of image matching. Their coarse-to-fine matching pipeline is particularly prone to incorrect matches in the coarse matching stage. This paper proposes to enhance coarse features by focusing attention learning more on matchable regions and to improve coarse match accuracy by exploring the geometric consistency among matches. For the enhanced feature extraction module, a regional attention mechanism is used in addition to the widely used global attention for self-/cross-feature interaction. For the feature matching module, a second-order geometric relation-induced matching confidence is proposed. These two modules respectively explore appearance and geometric cues to improve the quality of coarse matches and can be seamlessly integrated into existing coarse-to-fine matching pipelines. The effectiveness of the proposed method has been extensively validated on two popular coarse-to-fine matching pipelines (LoFTR and ASpanFormer), demonstrating improved performance on various image matching downstream tasks.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 289-295"},"PeriodicalIF":3.9,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信