Pattern Recognition最新文献

筛选
英文 中文
Domain adaptive depth completion via spatial-error consistency 基于空间误差一致性的域自适应深度补全
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-15 DOI: 10.1016/j.patcog.2025.111645
Lingyu Xiao , Jinhui Wu , Junjie Hu , Ziyu Li , Wankou Yang
{"title":"Domain adaptive depth completion via spatial-error consistency","authors":"Lingyu Xiao ,&nbsp;Jinhui Wu ,&nbsp;Junjie Hu ,&nbsp;Ziyu Li ,&nbsp;Wankou Yang","doi":"10.1016/j.patcog.2025.111645","DOIUrl":"10.1016/j.patcog.2025.111645","url":null,"abstract":"<div><div>In this paper, we introduce a novel training framework designed to address the challenge of unsupervised domain adaptation (UDA) in depth completion. Our framework aims to bridge the gap between lidar and image data by establishing a shared domain, which is a collection of the confidence of the network’s prediction. By indirectly adapting the depth network through this common domain, the problem is decomposed into two key tasks: (1) constructing the common domain and (2) adapting the depth network using the common domain. For the construction of the common domain, errors in the network’s predictions are modelled as confidence, which serves as supervision for a sub-module called the Depth Completion Plugin (DCPlugin). The purpose of the DCPlugin is to generate the confidence associated with any given dense depth prediction. To adapt the depth network using the common domain, a confidence-aware co-training task is employed, leveraging the confidence map provided by the well-adapted DCPlugin. To assess the effectiveness of our proposed approach, we conduct experiments on multiple depth networks under adaptation scenarios, namely CARLA <span><math><mo>→</mo></math></span> KITTI and VKITTI <span><math><mo>→</mo></math></span> KITTI. The results demonstrate that our method surpasses other domain adaptation (DA) techniques, achieving state-of-the-art performance. Given the limited existing work in this domain, we provide comprehensive discussions to guide future researchers in this field.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111645"},"PeriodicalIF":7.5,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking transformers with convolution and graph embeddings for few-shot molecular property discovery 基于卷积和图嵌入的小波分子性质发现的再思考
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-15 DOI: 10.1016/j.patcog.2025.111657
Luis H.M. Torres, Joel P. Arrais, Bernardete Ribeiro
{"title":"Rethinking transformers with convolution and graph embeddings for few-shot molecular property discovery","authors":"Luis H.M. Torres,&nbsp;Joel P. Arrais,&nbsp;Bernardete Ribeiro","doi":"10.1016/j.patcog.2025.111657","DOIUrl":"10.1016/j.patcog.2025.111657","url":null,"abstract":"<div><div>The prediction of molecular properties is a critical step in drug discovery campaigns. Computational methods such as graph neural networks (GNNs) and Transformers have effectively leveraged the small-range and long-range dependencies in molecules to preserve the local and global patterns for multiple molecular property prediction tasks. However, the dependence of these models on large amounts of experimental data poses a challenge, particularly on smaller biological datasets prevalent across the drug discovery pipeline. This paper introduces FS-GCvTR, a few-shot graph-based convolutional Transformer architecture designed to predict chemical properties with a small amount of labeled compounds. The convolutional Transformer is presented as a crucial component, effectively integrating both local and global dependencies of molecular graph embeddings by propagating a set of convolutional tokens across Transformer attention layers for molecular property prediction. Furthermore, a few-shot meta-learning approach is introduced to iteratively adapt model parameters across multiple few-shot tasks while generalizing to new chemical properties with limited available data. Experiments including few-shot evaluations on multi-property datasets show that the FS-GCvTR model outperformed other few-shot graph-based baselines in specific molecular property prediction tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111657"},"PeriodicalIF":7.5,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IFShip: Interpretable fine-grained ship classification with domain knowledge-enhanced vision-language models IFShip:基于领域知识增强的视觉语言模型的可解释细粒度船舶分类
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-15 DOI: 10.1016/j.patcog.2025.111672
Mingning Guo, Mengwei Wu, Yuxiang Shen, Haifeng Li, Chao Tao
{"title":"IFShip: Interpretable fine-grained ship classification with domain knowledge-enhanced vision-language models","authors":"Mingning Guo,&nbsp;Mengwei Wu,&nbsp;Yuxiang Shen,&nbsp;Haifeng Li,&nbsp;Chao Tao","doi":"10.1016/j.patcog.2025.111672","DOIUrl":"10.1016/j.patcog.2025.111672","url":null,"abstract":"<div><div>End-to-end interpretation currently dominates the remote sensing fine-grained ship classification (RS-FGSC) task. However, the inference process remains uninterpretable, leading to criticisms of these models as “black box” systems. To address this issue, we propose a domain knowledge-enhanced Chain-of-Thought (CoT) prompt generation mechanism, which is used to semi-automatically construct a task-specific instruction-following dataset, TITANIC-FGS. By training on TITANIC-FGS, we adapt general-domain vision-language models (VLMs) to the FGSC task, resulting in a model named IFShip. Building upon IFShip, we develop an FGSC visual chatbot that redefines the FGSC problem as a step-by-step reasoning task and conveys the reasoning process in natural language. Experimental results show that IFShip outperforms state-of-the-art FGSC algorithms in both interpretability and classification accuracy. Furthermore, compared to VLMs such as LLaVA and MiniGPT-4, IFShip demonstrates superior performance on the FGSC task. It provides an accurate chain of reasoning when fine-grained ship types are recognizable to the human eye and offers interpretable explanations when they are not.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111672"},"PeriodicalIF":7.5,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dynamic predictive transformer with temporal relevance regression for action detection 动态预测变压器与时间相关回归的动作检测
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-14 DOI: 10.1016/j.patcog.2025.111644
Matthew Korban , Peter Youngs , Scott T. Acton
{"title":"A dynamic predictive transformer with temporal relevance regression for action detection","authors":"Matthew Korban ,&nbsp;Peter Youngs ,&nbsp;Scott T. Acton","doi":"10.1016/j.patcog.2025.111644","DOIUrl":"10.1016/j.patcog.2025.111644","url":null,"abstract":"<div><div>This paper introduces a novel transformer network tailored to skeleton-based action detection in untrimmed long video streams. Our approach centers around three innovative mechanisms that collectively enhance the network’s temporal analysis capabilities. First, a new predictive attention mechanism incorporates future frame data into the sequence analysis during the training phase. This mechanism addresses the essential issue of the current action detection models: incomplete temporal modeling in long action sequences, particularly for boundary frames that lie outside the network’s immediate temporal receptive field, while maintaining computational efficiency. Second, we integrate a new adaptive weighted temporal attention system that dynamically evaluates the importance of each frame within an action sequence. In contrast to the existing approaches, the proposed weighting strategy is both adaptive and interpretable, making it highly effective in handling long sequences with numerous non-informative frames. Third, the network incorporates an advanced regression technique. This approach independently identifies the start and end frames based on their relevance to different frames. Unlike existing homogeneous regression methods, the proposed regression method is heterogeneous and based on various temporal relationships, including those in future frames in actions, making it more effective for action detection. Extensive experiments on prominent untrimmed skeleton-based action datasets, PKU-MMD, OAD, and the Charade dataset demonstrate the effectiveness of this network.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111644"},"PeriodicalIF":7.5,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure reversible privacy protection for face multiple attribute editing 安全可逆的隐私保护的脸多属性编辑
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-12 DOI: 10.1016/j.patcog.2025.111662
Yating Zeng, Xinpeng Zhang, Guorui Feng
{"title":"Secure reversible privacy protection for face multiple attribute editing","authors":"Yating Zeng,&nbsp;Xinpeng Zhang,&nbsp;Guorui Feng","doi":"10.1016/j.patcog.2025.111662","DOIUrl":"10.1016/j.patcog.2025.111662","url":null,"abstract":"<div><div>The demand for face attribute editing is increasing across various applications, such as digital media and virtual reality. However, while existing methods can achieve high-quality multi-attribute editing, they often struggle to balance privacy protection and image reversibility, and are prone to causing undesired changes in non-target attributes. To address these issues, we propose a novel Multi-Layer Mapping and Password Fusion (M-LMPF) framework for efficient and flexible face attribute editing with reversible privacy protection. Our approach integrates multi-attribute editing with secure reversible image attribute protection, enabling precise control over the modification of target attributes while preserving facial identity consistency and avoiding changes to other attributes. The framework employs a deep multi-layer latent mapping network that embeds password information at different granular levels in the latent space, allowing for fine-grained control over facial features. Additionally, we introduce a new encryption and decryption mechanism to ensure reversible editing of specific attributes, effectively preventing unauthorized access. Extensive experiments demonstrate that the M-LMPF framework outperforms state-of-the-art methods in attribute editing accuracy, reversibility, identity consistency, and image quality.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111662"},"PeriodicalIF":7.5,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143855128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Separation of Unknown Features and Samples for Unbiased Source-free Open Set Domain Adaptation 基于无偏无源开放集域自适应的未知特征与样本分离
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-12 DOI: 10.1016/j.patcog.2025.111661
Fu Li , Yifan Lan , Yuwu Lu , Wai Keung Wong , Ming Zhao , Zhihui Lai , Xuelong Li
{"title":"Separation of Unknown Features and Samples for Unbiased Source-free Open Set Domain Adaptation","authors":"Fu Li ,&nbsp;Yifan Lan ,&nbsp;Yuwu Lu ,&nbsp;Wai Keung Wong ,&nbsp;Ming Zhao ,&nbsp;Zhihui Lai ,&nbsp;Xuelong Li","doi":"10.1016/j.patcog.2025.111661","DOIUrl":"10.1016/j.patcog.2025.111661","url":null,"abstract":"<div><div>Open Set Domain Adaptation (OSDA) is proposed to train a model on a source domain that performs well on a target domain with domain discrepancy and unknown class samples outside the source domain. Recently, Source-free Open Set Domain Adaptation (SF-OSDA) aims to achieve OSDA without accessing source domain samples. Existing SF-OSDA only focuses on the known class samples in the target domain and overlooks the abundant unknown class semantics in the target domain. To address these issues, in this paper, we propose a Separation of Unknown Features and Samples (SUFS) method for unbiased SF-OSDA. Specifically, SUFS consists of a Sample Feature Separation (SFS) module that separates the private features from the known features in each sample. This module not only utilizes the semantic information of each sample label, but also explores the potential unknown information of each sample. Then, we integrate a Feature Correlation Representation (FCR) module, which computes the similarity between each sample and its neighboring samples to correct semantic bias and build instance-level decision boundaries. A large number of experiments in the SF-OSDA scenario have demonstrated the effectiveness of SUFS. In addition, SUFS also shows great performance in the Source-free Partial Domain Adaptation (SF-PDA) scenario.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111661"},"PeriodicalIF":7.5,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auto-adjustable dual-information graph regularized NMF for multiview data clustering 用于多视图数据聚类的自调节双信息图正则化NMF
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-12 DOI: 10.1016/j.patcog.2025.111679
Shuo Li , Chen Yang , Hui Guo
{"title":"Auto-adjustable dual-information graph regularized NMF for multiview data clustering","authors":"Shuo Li ,&nbsp;Chen Yang ,&nbsp;Hui Guo","doi":"10.1016/j.patcog.2025.111679","DOIUrl":"10.1016/j.patcog.2025.111679","url":null,"abstract":"<div><div>Multiview data processing has gained significant attention in machine learning due to its ability to integrate complementary information from diverse data sources. Among various multiview clustering methods, non-negative matrix factorization (NMF)-based approaches have shown strong potential. However, existing methods rely on fixed, single-loss functions and single manifold regularization terms, which limit their adaptability to diverse and heterogeneous datasets. To address these challenges, we propose the multiview auto-adjustable robust dual-information graph regularized non-negative matrix factorization (MARDNMF). This method introduces a novel set of dynamically adjustable loss functions, each incorporating two correntropy terms, which are tuned via adaptive parameters based on the data characteristics. Additionally, MARDNMF leverages multi-scale k-nearest neighbors (KNNs) to build a dual-information graph regularization term, capturing both local and discriminative manifold information. Experimental results across various datasets demonstrate that MARDNMF outperforms existing NMF-based methods in both single view and multiview clustering scenarios, offering enhanced robustness and adaptability.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111679"},"PeriodicalIF":7.5,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Masked auto-encoding and scatter-decoupling transformer for polarimetric SAR image classification 用于偏振合成孔径雷达图像分类的屏蔽自动编码和散射解耦变换器
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-12 DOI: 10.1016/j.patcog.2025.111660
Jie Geng, Lijia Dong, Yuhang Zhang, Wen Jiang
{"title":"Masked auto-encoding and scatter-decoupling transformer for polarimetric SAR image classification","authors":"Jie Geng,&nbsp;Lijia Dong,&nbsp;Yuhang Zhang,&nbsp;Wen Jiang","doi":"10.1016/j.patcog.2025.111660","DOIUrl":"10.1016/j.patcog.2025.111660","url":null,"abstract":"<div><div>The pixel level annotation of polarimetric SAR (PolSAR) image is quite difficult and requires a significant amount of manpower. Deep learning based PolSAR image classification generally faces the challenge of scarce labeled data. To address the above issue, we propose a self-supervised learning model based on masked auto-encoding and scatter-decoupling transformer (MAST) for PolSAR image classification, which aims to fully utilize a large number of unlabeled data. Combined with PolSAR scattering characteristics, an effective pre-training auxiliary task is designed to constrain the model in order to learn spatial information and global scattering representation from SAR images. In the fine-tuning stage, a scattering embedding module is applied to strengthen the representation of global semantic information with specific scattering characteristics. In addition, a supervised contrastive loss is introduced to improve the robustness of the classifier. Sufficient experiments are conducted on three public PolSAR datasets, and the results demonstrate the effectiveness of the proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111660"},"PeriodicalIF":7.5,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143824358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cascaded Physical-constraint Conditional Variational Auto Encoder with socially-aware diffusion for pedestrian trajectory prediction 具有社会意识扩散的级联物理约束条件变分自动编码器用于行人轨迹预测
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-12 DOI: 10.1016/j.patcog.2025.111667
Haojie Chen , Zhuo Wang , Hongde Qin , Xiaokai Mu
{"title":"Cascaded Physical-constraint Conditional Variational Auto Encoder with socially-aware diffusion for pedestrian trajectory prediction","authors":"Haojie Chen ,&nbsp;Zhuo Wang ,&nbsp;Hongde Qin ,&nbsp;Xiaokai Mu","doi":"10.1016/j.patcog.2025.111667","DOIUrl":"10.1016/j.patcog.2025.111667","url":null,"abstract":"<div><div>Pedestrian trajectory prediction serves as a crucial prerequisite for various tasks such as autonomous driving and human–robot interaction. The existing methods mainly leverage deep learning-based generative models to predict future multi-modal trajectories. Nevertheless, the inherent uncertainty in pedestrian movements poses a challenge for deep generative models to generate accurate and plausible future trajectories. In this paper, we propose a two-stage trajectory prediction network termed CPSD. In the first stage, a Cascaded Physical-constraint Conditional Variational Auto Encoder is proposed. It combines Differentiable Physical Constraint Conditional Variational Auto Encoders in the cascaded form to predict the trajectory coordinates with a stepwise manner, which improves the interpretability of deep generative network and alleviates the problem of prediction error accumulation over time. In the second stage, a Socially-aware Diffusion Model is proposed to refine the initial trajectory generated in the first stage. By introducing a non-local attention mechanism and constructing a social mask, we integrate pedestrian social interactions into the diffusion model, enabling the refinement of more realistic and plausible multi-modal pedestrian trajectories. Extensive experiments conducted on the public datasets SDD and ETH/UCY demonstrate that CPSD achieves more promising pedestrian trajectories compared with other state-of-the-art trajectory prediction algorithms.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111667"},"PeriodicalIF":7.5,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLLM as video narrator: Mitigating modality imbalance in video moment retrieval MLLM作为视频解说员:缓解视频时刻检索中的模态失衡
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-11 DOI: 10.1016/j.patcog.2025.111670
Weitong Cai , Jiabo Huang , Shaogang Gong , Hailin Jin , Yang Liu
{"title":"MLLM as video narrator: Mitigating modality imbalance in video moment retrieval","authors":"Weitong Cai ,&nbsp;Jiabo Huang ,&nbsp;Shaogang Gong ,&nbsp;Hailin Jin ,&nbsp;Yang Liu","doi":"10.1016/j.patcog.2025.111670","DOIUrl":"10.1016/j.patcog.2025.111670","url":null,"abstract":"<div><div>Video Moment Retrieval (VMR) aims to localize a specific temporal segment within an untrimmed long video given a natural language query. Existing methods often suffer from inadequate training annotations, <em>i.e.</em>, the sentence typically matches with a fraction of the prominent video content in the foreground with limited wording diversity. This intrinsic modality imbalance leaves a considerable portion of visual information remaining unaligned with text. It confines the cross-modal alignment knowledge within the scope of a limited text corpus, thereby leading to sub-optimal visual-textual modeling and poor generalizability. By leveraging the visual-textual understanding capability of multi-modal large language models (MLLM), in this work, we propose a novel MLLM-driven framework Text-Enhanced Alignment (TEA), to address the modality imbalance problem by enhancing the correlated visual-textual knowledge. TEA takes an MLLM as a video narrator to generate plausible textual descriptions of the video, thereby mitigating the modality imbalance and boosting the temporal localization. To effectively maintain temporal sensibility for localization, we design to get text narratives for each certain video timestamp and construct a structured text paragraph with time information, which is temporally aligned with the visual content. Then we perform cross-modal feature merging between the temporal-aware narratives and corresponding video temporal features to produce semantic-enhanced video representation sequences for query localization. Subsequently, we introduce a uni-modal narrative-query matching mechanism, which encourages the model to extract complementary information from contextual cohesive descriptions for improved retrieval. Extensive experiments on two benchmarks show the effectiveness and generalizability of our proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111670"},"PeriodicalIF":7.5,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信