Pattern Recognition最新文献

筛选
英文 中文
Multi-hop graph structural modeling for cancer-related circRNA-miRNA interaction prediction 癌症相关circRNA-miRNA相互作用预测的多跳图结构建模
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-07-03 DOI: 10.1016/j.patcog.2025.112078
Mengmeng Wei , Lei Wang , Xiaorui Su , Bowei Zhao , Zhuhong You
{"title":"Multi-hop graph structural modeling for cancer-related circRNA-miRNA interaction prediction","authors":"Mengmeng Wei ,&nbsp;Lei Wang ,&nbsp;Xiaorui Su ,&nbsp;Bowei Zhao ,&nbsp;Zhuhong You","doi":"10.1016/j.patcog.2025.112078","DOIUrl":"10.1016/j.patcog.2025.112078","url":null,"abstract":"<div><div>A substantial body of research indicates that circRNA can act as a sponge to absorb miRNA, thereby regulating the development of cancers. Existing circRNA-miRNA interactions (CMIs) prediction models mainly focus on single features and local structures of molecules, making it difficult to fully describe the overall properties of molecules and overlooking the multi-hierarchical associations between them. To address these challenges, we propose a computational model named GraCMI based on multi-hop graph structural modeling, which predicts CMIs by integrating structural and attribute information of molecules. GraCMI learns the representation of molecules in multi-level neighborhoods through constructing heterogeneous networks and performing high- and low-order matrix factorization. GraCMI captures both the intrinsic properties and global structures of molecules, extracting and fusing multi-source features, improving prediction accuracy. In the case studies, 7 out of the top 10 CMI pairs predicted using GraCMI on a real cancer-related dataset were confirmed. Additionally, GraCMI demonstrates a competitive advantage on two other classic datasets. Overall, the experimental results show that GraCMI can effectively predict CMIs, which is expected to provide new insights into future miRNA-mediated circRNA regulation of cancer development.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112078"},"PeriodicalIF":7.5,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144579266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MHAN: Multi-head hybrid attention network for facial expression recognition 面部表情识别的多头混合注意网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-07-03 DOI: 10.1016/j.patcog.2025.112015
Xiaofeng Wang , Tianbo Han , Songling Liu , Muhammad Shahroz Ajmal , Lu Chen , Yongqin Zhang , Yonghuai Liu
{"title":"MHAN: Multi-head hybrid attention network for facial expression recognition","authors":"Xiaofeng Wang ,&nbsp;Tianbo Han ,&nbsp;Songling Liu ,&nbsp;Muhammad Shahroz Ajmal ,&nbsp;Lu Chen ,&nbsp;Yongqin Zhang ,&nbsp;Yonghuai Liu","doi":"10.1016/j.patcog.2025.112015","DOIUrl":"10.1016/j.patcog.2025.112015","url":null,"abstract":"<div><div>Integrating Facial Expression Recognition (FER) with deep learning techniques has significantly enhanced emotion analysis performance in the past decade. Convolutional neural networks (CNNs) and attention mechanisms facilitate the automatic extraction of complex features from facial expressions. However, current methods often face challenges in accurately capturing subtle variations in expressions, tend to be computationally intensive, and are susceptible to overfitting. To address these challenges, this paper proposes a lightweight FER model based on multi-head hybrid attention networks (MHAN). It designs two innovative modules: efficient local attention mixed feature network (ELA-MFN) and multi-head hybrid attention mechanism (MHAtt). The former integrates multi-scale convolutional kernels with the ELA attention mechanism to enhance feature representation while ensuring precise localization of critical areas, all within a lightweight framework. The latter utilizes multiple attention heads to generate attention maps and capture subtle distinctions in expressions. With only 4.27M parameters (94% reduction from POSTER’s 71.8M), MHAN effectively reduces computational resource requirements, and can be efficiently implemented for both fully supervised and semi-supervised learning tasks. And it employs a smooth label loss function solving overfitting issue. We have validated the effectiveness of MHAN over three public datasets RAF-DB, AffectNet, and FERPlus, including cross-dataset tests. The results show that MHAN outperforms state-of-the-art models in terms of accuracy and computational complexity, demonstrating improved robustness. MHAN can also recognize the expressions of non-traditional datasets like sculptures, validating its cross-domain generalization capabilities. The source code is available at <span><span>https://github.com/hanyao666/MHAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112015"},"PeriodicalIF":7.5,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144596969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving adversarial transferability and imperceptibility with loss landscape and diffusion model 利用损失景观和扩散模型提高对抗转移性和不可感知性
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-07-03 DOI: 10.1016/j.patcog.2025.112076
Jiayang Liu , Weiming Zhang , Han Fang , Wenbo Zhou , Ee-Chien Chang , Siew-Kei Lam
{"title":"Improving adversarial transferability and imperceptibility with loss landscape and diffusion model","authors":"Jiayang Liu ,&nbsp;Weiming Zhang ,&nbsp;Han Fang ,&nbsp;Wenbo Zhou ,&nbsp;Ee-Chien Chang ,&nbsp;Siew-Kei Lam","doi":"10.1016/j.patcog.2025.112076","DOIUrl":"10.1016/j.patcog.2025.112076","url":null,"abstract":"<div><div>Deep neural networks (DNNs) are vulnerable to adversarial examples, which introduce imperceptible perturbations on benign samples to mislead the prediction of DNNs. Transferability is a key property of adversarial examples, which enables adversarial examples crafted for one network to deceive other networks with high probability. However, the adversarial perturbations introduced by transferable attacks are perceptible to human observers. Although there are unrestricted attacks which can achieve good visual imperceptibility, the adversarial transferability of these attacks remains relatively low. In this paper, we propose to improve adversarial transferability and imperceptibility of adversarial examples via flat loss landscape and diffusion models. Specifically, we utilize denoising diffusion implicit model (DDIM) inversion operation to map the input image back to the diffusion latent space. Then we add perturbations on the diffusion latent space to achieve successful attacks on the surrogate model and flat input loss landscape, resulting in high adversarial transferability and imperceptible perturbations to human observers. Extensive experiments demonstrate that our proposed method enhances adversarial transferability while preserving the imperceptibility of the generated adversarial examples.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112076"},"PeriodicalIF":7.5,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144580519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature selection based on rough diversity entropy 基于粗糙多样性熵的特征选择
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-07-03 DOI: 10.1016/j.patcog.2025.112032
Xiongtao Zou, Jianhua Dai
{"title":"Feature selection based on rough diversity entropy","authors":"Xiongtao Zou,&nbsp;Jianhua Dai","doi":"10.1016/j.patcog.2025.112032","DOIUrl":"10.1016/j.patcog.2025.112032","url":null,"abstract":"<div><div>Information entropy, as a powerful tool for measuring the uncertainty of information, is widely used in many fields such as communication, data compression, data mining and bioinformatics. However, the classical information entropy has two shortcomings, that is, information entropy cannot accurately measure the uncertainty of knowledge in some cases and the joint probability in information entropy is usually difficult to calculate for high-dimensional data. Additionally, uncertainty measure is the foundation of feature selection in granular computing. Inaccurate measures may lead to poor performance of feature selection methods. To address these issues, we propose a novel uncertainty measure called rough diversity entropy based on rough set theory. Rough diversity entropy can more accurately measure the uncertainty of knowledge compared with the classical information entropy. In this article, rough diversity entropy and its variants are first defined, and their related properties are studied. Next, a heuristic feature selection method based on the defined measures is put forward, and the corresponding algorithm is also designed. Finally, a series of experiments are executed to validate the effectiveness and rationality of the proposed method. The analysis results show that our proposed method has good performance compared with eight existing feature selection methods. Moreover, the proposed method improves the average accuracy of 15 datasets by 6.53% under four classifiers, and achieves an average feature reduction rate of up to 99.81%. We believe that the proposed method is an effective feature selection approach for classification learning.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112032"},"PeriodicalIF":7.5,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144572736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ST-KeyS: Self-supervised Transformer for Keyword Spotting in historical handwritten documents ST-KeyS:自监督变压器关键字发现在历史手写文件
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-07-02 DOI: 10.1016/j.patcog.2025.112036
Sana Khamekhem Jemni , Sourour Ammar , Mohamed Ali Souibgui , Yousri Kessentini , Abbas Cheddad
{"title":"ST-KeyS: Self-supervised Transformer for Keyword Spotting in historical handwritten documents","authors":"Sana Khamekhem Jemni ,&nbsp;Sourour Ammar ,&nbsp;Mohamed Ali Souibgui ,&nbsp;Yousri Kessentini ,&nbsp;Abbas Cheddad","doi":"10.1016/j.patcog.2025.112036","DOIUrl":"10.1016/j.patcog.2025.112036","url":null,"abstract":"<div><div>Keyword spotting (KWS) in historical documents is an important tool for the initial exploration of digitized collections. Nowadays, the most efficient KWS methods rely on machine learning techniques, which typically require a large amount of annotated training data. However, in the case of historical manuscripts, there is a lack of annotated corpora for training. To handle the data scarcity issue, we investigate the merits of self-supervised learning to extract useful representations of the input data without relying on human annotations and then use these representations in the downstream task. We propose ST-KeyS, a masked auto-encoder model based on vision transformers where the pretraining stage is based on the mask-and-predict paradigm without the need for labeled data. In the fine-tuning stage, the pre-trained encoder is integrated into a fine-tuned Siamese neural network model to improve feature embedding from the input images. We further improve the image representation using pyramidal histogram of characters (PHOC) embedding to create and exploit an intermediate representation of images based on text attributes. The proposed approach outperforms state-of-the-art methods trained on the same datasets in an exhaustive experimental evaluation of five widely used benchmark datasets (Botany, Alvermann Konzilsprotokolle, George Washington, Esposalles, and RIMES).</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112036"},"PeriodicalIF":7.5,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144563649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust oblique projection and weighted NMF for hyperspectral unmixing 用于高光谱解混的鲁棒斜投影和加权NMF
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-07-02 DOI: 10.1016/j.patcog.2025.112029
Yasin Hashemi-Nazari , Azita Tajaddini , Farid Saberi-Movahed , Fernando Alonso-Fernandez , Prayag Tiwari
{"title":"Robust oblique projection and weighted NMF for hyperspectral unmixing","authors":"Yasin Hashemi-Nazari ,&nbsp;Azita Tajaddini ,&nbsp;Farid Saberi-Movahed ,&nbsp;Fernando Alonso-Fernandez ,&nbsp;Prayag Tiwari","doi":"10.1016/j.patcog.2025.112029","DOIUrl":"10.1016/j.patcog.2025.112029","url":null,"abstract":"<div><div>Hyperspectral unmixing (HU) is a crucial method for interpreting remotely sensed hyperspectral images (HSIs), with the aim of splitting the image into pure spectral components (endmembers) and their abundance fractions in every pixel of the scene. However, the effectiveness of this procedure is hindered by the presence of noise and anomalies. These kind of disruptions mainly arise from real-world factors such as atmospheric effects and endmember variability. To address this challenge, a novel approach called Graph-Regularized Oblique Projection Weighted NMF (GOP-WNMF) is introduced, which is grounded in a more precise separation of signal and noise subspaces, aiming to enhance the accuracy and robustness of the analysis. GOP-WNMF achieves this by constructing an oblique projector that projects each pixel onto the signal subspace, i.e., the space formed by signatures of endmembers, and parallel to the noise subspace. This approach effectively suppresses noise while preserving crucial spectral information. Furthermore, our new oblique NMF framework includes a unique residual-based weighting approach to detect and remove anomalies in pixels and spectral bands simultaneously. In addition to this, another weighting matrix is proposed by establishing a bipartite graph connecting endmembers and pixels to promote smoothness and sparsity in the resulting abundance maps. GOP-WNMF also enhances abundance map estimation accuracy by mitigating the negative effects of pixel outliers through the utilization of Laplacian eigenmaps technique to maintain the manifold structure of data. The effectiveness of GOP-WNMF is evaluated through comprehensive testing on synthetic and real HSIs, and its superiority is demonstrated over multiple state-of-the-art approaches. The source code is also available at <span><span>https://github.com/yasinhashemi/GOP-WNMF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112029"},"PeriodicalIF":7.5,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144589110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixture of coarse and fine-grained prompt tuning for vision-language model 混合粗粒度和细粒度的视觉语言模型提示调优
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-07-02 DOI: 10.1016/j.patcog.2025.112074
Yansheng Gao , Zixi Zhu , Shengsheng Wang
{"title":"Mixture of coarse and fine-grained prompt tuning for vision-language model","authors":"Yansheng Gao ,&nbsp;Zixi Zhu ,&nbsp;Shengsheng Wang","doi":"10.1016/j.patcog.2025.112074","DOIUrl":"10.1016/j.patcog.2025.112074","url":null,"abstract":"<div><div>Visual Language Models (VLMs) exhibit impressive performance across various tasks but often suffer from degradation of prior knowledge when transferred to downstream tasks with limited computational samples. Prompt tuning methods emerge as an effective solution to mitigate this issue. However, most existing approaches solely rely on coarse-grained text prompt or fine-grained text prompt, which may limit the discriminative and generalization capabilities of VLMs. To address these limitations, we propose <strong>Mixture of Coarse and Fine-grained Prompt Tuning (MCFPT)</strong>, a novel method that integrates both coarse and fine-grained prompts to enhance the performance of VLMs. Inspired by the Mixture-of-Experts (MoE) mechanism, MCFPT incorporates a <strong>Mixed Fusion Module (MFM)</strong> to fuse and select coarse domain-shared text feature and fine-grained category-discriminative text feature to get the mixed feature. Additionally, a <strong>Dynamic Refinement Adapter (DRA)</strong> is introduced to adjust category distributions, ensuring consistency between refined and mixed text features. These components collectively improve the generalization and discriminative power of VLMs. Extensive experiments across four scenarios-base-to-new, few-shot classification, domain generalization, and cross-domain classification-demonstrate that MCFPT achieves exceptional performance compared to state-of-the-art methods, with significant improvements in HM scores across multiple datasets. Our findings highlight MCFPT as a robust approach for improving the adaptability and efficiency of Visual Language Models in diverse application domains.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112074"},"PeriodicalIF":7.5,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144557233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing outdoor vision: Binocular desnowing with dual-stream temporal transformer 增强户外视力:双流时间转换器双目降雪
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-07-02 DOI: 10.1016/j.patcog.2025.112075
En Yu, Jie Lu, Kaihao Zhang, Guangquan Zhang
{"title":"Enhancing outdoor vision: Binocular desnowing with dual-stream temporal transformer","authors":"En Yu,&nbsp;Jie Lu,&nbsp;Kaihao Zhang,&nbsp;Guangquan Zhang","doi":"10.1016/j.patcog.2025.112075","DOIUrl":"10.1016/j.patcog.2025.112075","url":null,"abstract":"<div><div>Video desnowing, aimed at removing snowflakes and enhancing the quality of videos, is a crucial yet intricate task essential for improving the effectiveness of outdoor vision systems. Compared to rain and haze, the inherent opacity and diverse morphology of snowflakes result in more pronounced background occlusions, thereby challenging the efficacy of current desnowing techniques, particularly those focusing solely on images or videos captured from a monocular perspective. To address these challenges, this paper proposes a Dual-Stream Temporal Transformer (DSTT) to advance snow removal and visual enhancement by leveraging comprehensive information from stereo views and spatial-temporal cues. More specifically, it incorporates a Dual-Stream Weight-shared Transformer (DSWT) module to exploit spatial information from different views. This module employs a hierarchical weight-sharing strategy to extract fused spatial features across different views from low-level to high-level layers. Subsequently, the Dual-Stream ConvLSTM (DS-CLSTM) module is introduced to capture temporal correlations across streaming frames. By combining temporal-spatial cues and complementary details from diverse views, videos can be effectively restored while preserving the original content’s details. In addition, two binocular snowy datasets – SnowKITTI2012 and SnowKITTI 2015 – are presented, providing a valuable resource for evaluating the binocular desnowing task. Comprehensive experiments evaluated on both synthetic and real-world snowy datasets demonstrate that our proposed method outperforms the state-of-the-art baselines.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112075"},"PeriodicalIF":7.5,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144563005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EC-SLAM: Effectively constrained neural RGB-D SLAM with TSDF hash encoding and joint optimization EC-SLAM:基于TSDF哈希编码和联合优化的有效约束神经RGB-D SLAM
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-07-02 DOI: 10.1016/j.patcog.2025.112034
Guanghao Li , Qi Chen , Yuxiang Yan , Jian Pu
{"title":"EC-SLAM: Effectively constrained neural RGB-D SLAM with TSDF hash encoding and joint optimization","authors":"Guanghao Li ,&nbsp;Qi Chen ,&nbsp;Yuxiang Yan ,&nbsp;Jian Pu","doi":"10.1016/j.patcog.2025.112034","DOIUrl":"10.1016/j.patcog.2025.112034","url":null,"abstract":"<div><div>We introduce EC-SLAM, a real-time dense RGB-D Simultaneous Localization and Mapping (SLAM) system leveraging Neural Radiance Fields (NeRF). While recent NeRF-based SLAM systems have shown promising results, they have yet to exploit NeRF’s potential to estimate system state fully. EC-SLAM overcomes this limitation by using a Truncated Signed Distance Fields (TSDF) opacity function with sharp inductive bias to strengthen constraints in sparse parametric encodings, which reduces the number of model parameters and enhances accuracy. Additionally, our system employs a highly constrained global joint optimization approach coupled with a feature-based, uniform sampling algorithm, enabling efficient fusion between TSDF and sparse parametric encodings. This approach reinforces constraints on keyframes most relevant to the current frame, mitigates the influence of random sampling, and effectively utilizes NeRF’s implicit loop closure capability. Extensive evaluations and ablations on the Replica, ScanNet, and TUM datasets demonstrate state-of-the-art performance, achieving precise tracking and reconstruction while maintaining real-time operation at up to 21 FPS. The source code is available at <span><span>https://github.com/Lightingooo/EC-SLAM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112034"},"PeriodicalIF":7.5,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144548748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IDEAL: Independent domain embedding augmentation learning 理想:独立域嵌入增强学习
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-07-01 DOI: 10.1016/j.patcog.2025.112024
Zelin Yang , Lin Xu , Shiyang Yan , Haixia Bi , Fan Li
{"title":"IDEAL: Independent domain embedding augmentation learning","authors":"Zelin Yang ,&nbsp;Lin Xu ,&nbsp;Shiyang Yan ,&nbsp;Haixia Bi ,&nbsp;Fan Li","doi":"10.1016/j.patcog.2025.112024","DOIUrl":"10.1016/j.patcog.2025.112024","url":null,"abstract":"<div><div>Deep metric learning is fundamental to open-set pattern recognition and has become a focal point of research in recent years. Significant efforts have been devoted to designing sampling, mining, and weighting strategies within algorithmic-level deep metric learning (DML) loss objectives. However, less attention has been paid to input-level but essential data transformations. In this paper, we develop a novel mechanism, independent domain embedding augmentation learning (IDEAL) method. It can simultaneously learn multiple independent embedding spaces for multiple domains generated by predefined data transformations. Our IDEAL is orthogonal to existing DML techniques and can be seamlessly combined with one DML approach for enhanced performance. Empirical results on visual retrieval tasks demonstrate the superiority of the proposed method. For instance, IDEAL significantly improves the performance of both Multi-Similarity (MS) Loss and Hypergraph-Induced Semantic Tuplet (HIST) loss. Specifically, it boosts the Recall<span><math><mrow><mi>@</mi><mn>1</mn></mrow></math></span> from 84.5% <span><math><mo>→</mo></math></span> 87.1% for MS Loss on Cars-196 and from 65.8% <span><math><mo>→</mo></math></span> 69.5% on CUB-200. Similarly, for HIST loss, IDEAL improves the performance on Cars-196 from 87.4% <span><math><mo>→</mo></math></span> 90.3%, on CUB-200 from 69.7% to 72.3%. It significantly outperforms methods using basic network architectures (e.g., ResNet-50, BN-Inception), such as XBM and Intra-Batch. The source code of our proposed method is available at <span><span>https://github.com/emdata-ailab/Ideal-learning</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"170 ","pages":"Article 112024"},"PeriodicalIF":7.5,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144572725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信