Pattern Recognition最新文献

筛选
英文 中文
Cross-modal adapter for vision–language retrieval 视觉语言检索的跨模态适配器
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-11-03 DOI: 10.1016/j.patcog.2024.111144
Haojun Jiang , Jianke Zhang , Rui Huang , Chunjiang Ge , Zanlin Ni , Shiji Song , Gao Huang
{"title":"Cross-modal adapter for vision–language retrieval","authors":"Haojun Jiang ,&nbsp;Jianke Zhang ,&nbsp;Rui Huang ,&nbsp;Chunjiang Ge ,&nbsp;Zanlin Ni ,&nbsp;Shiji Song ,&nbsp;Gao Huang","doi":"10.1016/j.patcog.2024.111144","DOIUrl":"10.1016/j.patcog.2024.111144","url":null,"abstract":"<div><div>Vision–language retrieval is an important multi-modal learning topic, where the goal is to retrieve the most relevant visual candidate for a given text query. Recently, pre-trained models, <em>e.g.</em>, CLIP, show great potential on retrieval tasks. However, as pre-trained models are scaling up, fully fine-tuning them on donwstream retrieval datasets has a high risk of overfitting. Moreover, in practice, it would be costly to train and store a large model for each task. To overcome the above issues, we present a novel <strong>Cross-Modal Adapter</strong> for parameter-efficient transfer learning. Inspired by adapter-based methods, we adjust the pre-trained model with a few parameterization layers. However, there are two notable differences. First, our method is designed for the multi-modal domain. Secondly, it allows encoder-level implicit cross-modal interactions between vision and language encoders. Although surprisingly simple, our approach has three notable benefits: (1) reduces the vast majority of fine-tuned parameters, (2) saves training time, and (3) allows all the pre-trained parameters to be fixed, enabling the pre-trained model to be shared across datasets. Extensive experiments demonstrate that, without bells and whistles, our approach outperforms adapter-based methods on image–text retrieval datasets (MSCOCO, Flickr30K) and video–text retrieval datasets (MSR-VTT, DiDeMo, and ActivityNet).</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111144"},"PeriodicalIF":7.5,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image shadow removal via multi-scale deep Retinex decomposition 通过多尺度深度 Retinex 分解去除图像阴影
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111126
Yan Huang , Xinchang Lu , Yuhui Quan , Yong Xu , Hui Ji
{"title":"Image shadow removal via multi-scale deep Retinex decomposition","authors":"Yan Huang ,&nbsp;Xinchang Lu ,&nbsp;Yuhui Quan ,&nbsp;Yong Xu ,&nbsp;Hui Ji","doi":"10.1016/j.patcog.2024.111126","DOIUrl":"10.1016/j.patcog.2024.111126","url":null,"abstract":"<div><div>In recent years, deep learning has emerged as an important tool for image shadow removal. However, existing methods often prioritize shadow detection and, in doing so, they oversimplify the lighting conditions of shadow regions. Furthermore, these methods neglect cues from the overall image lighting when re-lighting shadow areas, thereby failing to ensure global lighting consistency. To address these challenges in images captured under complex lighting conditions, this paper introduces a multi-scale network built on a Retinex decomposition model. The proposed approach effectively senses shadows with uneven lighting and re-light them, achieving greater consistency along shadow boundaries. Furthermore, for the design of network, we introduce several techniques for boosting shadow removal performance, including a shadow-aware channel attention module, local discriminative and Retinex decomposition loss functions, and a multi-scale mechanism for guiding Retinex decomposition by concurrently capturing both fine-grained details and large-scale contextual information. Experimental results demonstrate the superiority of our proposed method over existing solutions, particularly for images taken under complex lighting conditions.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111126"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
S2Match: Self-paced sampling for data-limited semi-supervised learning S2Match:用于数据有限的半监督学习的自定步调采样
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111121
Dayan Guan , Yun Xing , Jiaxing Huang , Aoran Xiao , Abdulmotaleb El Saddik , Shijian Lu
{"title":"S2Match: Self-paced sampling for data-limited semi-supervised learning","authors":"Dayan Guan ,&nbsp;Yun Xing ,&nbsp;Jiaxing Huang ,&nbsp;Aoran Xiao ,&nbsp;Abdulmotaleb El Saddik ,&nbsp;Shijian Lu","doi":"10.1016/j.patcog.2024.111121","DOIUrl":"10.1016/j.patcog.2024.111121","url":null,"abstract":"<div><div>Data-limited semi-supervised learning tends to be severely degraded by miscalibration (i.e., misalignment between confidence and correctness of predicted pseudo labels) and stuck at poor local minima while learning from the same set of over-confident yet incorrect pseudo labels repeatedly. We design a simple and effective self-paced sampling technique that can greatly alleviate the impact of miscalibration and learn more accurate semi-supervised models from limited training data. Instead of employing static or dynamic confidence thresholds which is sensitive to miscalibration, the proposed self-paced sampling follows a simple linear policy to select pseudo labels which eases repeated learning from the same set of falsely predicted pseudo labels at the early training stage and lowers the chance of being stuck at local minima effectively. Despite its simplicity, extensive evaluations over multiple data-limited semi-supervised tasks show the proposed self-paced sampling outperforms the state-of-the-art consistently by large margins.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111121"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistency-driven feature scoring and regularization network for visible–infrared person re-identification 用于可见红外人员再识别的一致性驱动特征评分和正则化网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111131
Xueting Chen , Yan Yan , Jing-Hao Xue , Nannan Wang , Hanzi Wang
{"title":"Consistency-driven feature scoring and regularization network for visible–infrared person re-identification","authors":"Xueting Chen ,&nbsp;Yan Yan ,&nbsp;Jing-Hao Xue ,&nbsp;Nannan Wang ,&nbsp;Hanzi Wang","doi":"10.1016/j.patcog.2024.111131","DOIUrl":"10.1016/j.patcog.2024.111131","url":null,"abstract":"<div><div>Recently, visible–infrared person re-identification (VI-ReID) has received considerable attention due to its practical importance. A number of methods extract multiple local features to enrich the diversity of feature representations. However, some local features often involve modality-relevant information, leading to deteriorated performance. Moreover, existing methods optimize the models by only considering the samples at each batch while ignoring the learned features at previous iterations. As a result, the features of the same person images drastically change at different training epochs, hindering the training stability. To alleviate the above issues, we propose a novel consistency-driven feature scoring and regularization network (CFSR-Net), which consists of a backbone network, a local feature learning block, a feature scoring block, and a global–local feature fusion block, for VI-ReID. On the one hand, we design a cross-modality consistency loss to highlight modality-irrelevant local features and suppress modality-relevant local features for each modality, facilitating the generation of a reliable compact local feature. On the other hand, we develop a feature consistency regularization strategy (including a momentum class contrastive loss and a momentum distillation loss) to impose consistency regularization on the learning of different levels of features by considering the learned features at historical epochs. This effectively enables smooth feature changes and thus improves the training stability. Extensive experiments on public VI-ReID datasets clearly show the effectiveness of our method against several state-of-the-art VI-ReID methods. Code will be released at <span><span>https://github.com/cxtjl/CFSR-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111131"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ANNE: Adaptive Nearest Neighbours and Eigenvector-based sample selection for robust learning with noisy labels ANNE:基于自适应近邻和特征向量的样本选择,实现带噪声标签的稳健学习
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111132
Filipe R. Cordeiro , Gustavo Carneiro
{"title":"ANNE: Adaptive Nearest Neighbours and Eigenvector-based sample selection for robust learning with noisy labels","authors":"Filipe R. Cordeiro ,&nbsp;Gustavo Carneiro","doi":"10.1016/j.patcog.2024.111132","DOIUrl":"10.1016/j.patcog.2024.111132","url":null,"abstract":"<div><div>An important stage of most state-of-the-art (SOTA) noisy-label learning methods consists of a sample selection procedure that classifies samples from the noisy-label training set into noisy-label or clean-label subsets. The process of sample selection typically consists of one of the two approaches: loss-based sampling, where high-loss samples are considered to have noisy labels, or feature-based sampling, where samples from the same class tend to cluster together in the feature space and noisy-label samples are identified as anomalies within those clusters. Empirically, loss-based sampling is robust to a wide range of noise rates, while feature-based sampling tends to work effectively in particular scenarios, e.g., the filtering of noisy instances via their eigenvectors (FINE) sampling exhibits greater robustness in scenarios with low noise rates, and the K nearest neighbour (KNN) sampling mitigates better high noise-rate problems. This paper introduces the Adaptive Nearest Neighbours and Eigenvector-based (ANNE) sample selection methodology, a novel approach that integrates loss-based sampling with the feature-based sampling methods FINE and Adaptive KNN to optimize performance across a wide range of noise rate scenarios. ANNE achieves this integration by first partitioning the training set into high-loss and low-loss sub-groups using loss-based sampling. Subsequently, within the low-loss subset, sample selection is performed using FINE, while the high-loss subset employs Adaptive KNN for effective sample selection. We integrate ANNE into the noisy-label learning state of the art (SOTA) method SSR+, and test it on CIFAR-10/-100 (with symmetric, asymmetric and instance-dependent noise), Webvision and ANIMAL-10, where our method shows better accuracy than the SOTA in most experiments, with a competitive training time. The code is available at <span><span>https://github.com/filipe-research/anne</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111132"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised random mask attention GAN in tackling pose-invariant face recognition 解决姿态不变人脸识别问题的自监督随机掩码注意力 GAN
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111112
Jiashu Liao , Tanaya Guha , Victor Sanchez
{"title":"Self-supervised random mask attention GAN in tackling pose-invariant face recognition","authors":"Jiashu Liao ,&nbsp;Tanaya Guha ,&nbsp;Victor Sanchez","doi":"10.1016/j.patcog.2024.111112","DOIUrl":"10.1016/j.patcog.2024.111112","url":null,"abstract":"<div><div>Pose Invariant Face Recognition (PIFR) has significantly advanced with Generative Adversarial Networks (GANs), which rotate face images acquired at any angle to a frontal view for enhanced recognition. However, such frontalization methods typically need ground-truth frontal-view images, often collected under strict laboratory conditions, making it challenging and costly to acquire the necessary training data. Additionally, traditional self-supervised PIFR methods rely on external rendering models for training, further complicating the overall training process. To tackle these two issues, we propose a new framework called <em>Mask Rotate</em>. Our framework introduces a novel training approach that requires no paired ground truth data for the face image frontalization task. Moreover, it eliminates the need for an external rendering model during training. Specifically, our framework simplifies the face image frontalization task by transforming it into a face image completion task. During the inference or testing stage, it employs a reliable pre-trained rendering model to obtain a frontal-view face image, which may have several regions with missing texture due to pose variations and occlusion. Our framework then uses a novel self-supervised <em>Random Mask</em> Attention Generative Adversarial Network (RMAGAN) to fill in these missing regions by considering them as randomly masked regions. Furthermore, our proposed <em>Mask Rotate</em> framework uses a reliable post-processing model designed to improve the visual quality of the face images after frontalization. In comprehensive experiments, the <em>Mask Rotate</em> framework eliminates the requirement for complex computations during training and achieves strong results, both qualitative and quantitative, compared to the state-of-the-art.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111112"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spcolor: Semantic prior guided exemplar-based image colorization Spcolor:基于先验语义引导的示例图像着色
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111109
Siqi Chen , Xianlin Zhang , Mingdao Wang , Xueming Li , Yu Zhang , Yue Zhang
{"title":"Spcolor: Semantic prior guided exemplar-based image colorization","authors":"Siqi Chen ,&nbsp;Xianlin Zhang ,&nbsp;Mingdao Wang ,&nbsp;Xueming Li ,&nbsp;Yu Zhang ,&nbsp;Yue Zhang","doi":"10.1016/j.patcog.2024.111109","DOIUrl":"10.1016/j.patcog.2024.111109","url":null,"abstract":"<div><div>Exemplar-based image colorization aims to colorize a target grayscale image based on a color reference image, and the key is to establish accurate pixel-level semantic correspondence between these two images. Previous methods directly search for correspondence over the entire reference image, and this type of global matching is prone to mismatch. Intuitively, a reasonable correspondence should be established between objects which are semantically similar. Motivated by this, we introduce the idea of semantic prior and propose SPColor, a semantic prior guided exemplar-based image colorization framework. Several novel components are systematically designed in SPColor, including a semantic prior guided correspondence network (SPC), a category reduction algorithm (CRA), and a similarity masked perceptual loss (SMP loss). Different from previous methods, SPColor establishes the correspondence between the pixels in the same semantic class locally. In this way, improper correspondence between different semantic classes is explicitly excluded, and the mismatch is obviously alleviated. In addition, SPColor supports region-level class assignments before SPC in the pipeline. With this feature, a category manipulation process (CMP) is proposed as an interactive interface to control colorization, which can also produce more varied colorization results and improve the flexibility of reference selection. Experiments demonstrate that our model outperforms recent state-of-the-art methods both quantitatively and qualitatively on public dataset. Our code is available at <span><span>https://github.com/viector/spcolor</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111109"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing robust VQA via contrastive and self-supervised learning 通过对比和自我监督学习增强稳健 VQA
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111129
Runlin Cao , Zhixin Li , Zhenjun Tang , Canlong Zhang , Huifang Ma
{"title":"Enhancing robust VQA via contrastive and self-supervised learning","authors":"Runlin Cao ,&nbsp;Zhixin Li ,&nbsp;Zhenjun Tang ,&nbsp;Canlong Zhang ,&nbsp;Huifang Ma","doi":"10.1016/j.patcog.2024.111129","DOIUrl":"10.1016/j.patcog.2024.111129","url":null,"abstract":"<div><div>Visual Question Answering (VQA) aims to evaluate the reasoning abilities of an intelligent agent using visual and textual information. However, recent research indicates that many VQA models rely primarily on learning the correlation between questions and answers in the training dataset rather than demonstrating actual reasoning ability. To address this limitation, we propose a novel training approach called Enhancing Robust VQA via Contrastive and Self-supervised Learning (CSL-VQA) to construct a more robust VQA model. Our approach involves generating two types of negative samples to balance the biased data, using self-supervised auxiliary tasks to help the base VQA model overcome language priors, and filtering out biased training samples. In addition, we construct positive samples by removing spurious correlations in biased samples and perform auxiliary training through contrastive learning. Our approach does not require additional annotations and is compatible with different VQA backbones. Experimental results demonstrate that CSL-VQA significantly outperforms current state-of-the-art approaches, achieving an accuracy of 62.30% on the VQA-CP v2 dataset, while maintaining robust performance on the in-distribution VQA v2 dataset. Moreover, our method shows superior generalization capabilities on challenging datasets such as GQA-OOD and VQA-CE, proving its effectiveness in reducing language bias and enhancing the overall robustness of VQA models.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111129"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142594026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransMatch: Transformer-based correspondence pruning via local and global consensus TransMatch:通过局部和全局共识进行基于变换器的对应剪枝
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111120
Yizhang Liu , Yanping Li , Shengjie Zhao
{"title":"TransMatch: Transformer-based correspondence pruning via local and global consensus","authors":"Yizhang Liu ,&nbsp;Yanping Li ,&nbsp;Shengjie Zhao","doi":"10.1016/j.patcog.2024.111120","DOIUrl":"10.1016/j.patcog.2024.111120","url":null,"abstract":"<div><div>Correspondence pruning aims to filter out false correspondences (a.k.a. outliers) from the initial feature correspondence set, which is pivotal to matching-based vision tasks, such as image registration. To solve this problem, most existing learning-based methods typically use a multilayer perceptron framework and several well-designed modules to capture local and global contexts. However, few studies have explored how local and global consensuses interact to form cohesive feature representations. This paper proposes a novel framework called TransMatch, which leverages the full power of Transformer structure to extract richer features and facilitate progressive local and global consensus learning. In addition to enhancing feature learning, Transformer is used as a powerful tool to connect the above two consensuses. Benefiting from Transformer, our TransMatch is surprisingly effective for differentiating correspondences. Experimental results on correspondence pruning and camera pose estimation demonstrate that the proposed TransMatch outperforms other state-of-the-art methods by a large margin. The code will be available at <span><span>https://github.com/lyz8023lyp/TransMatch/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111120"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
L2T-DFM: Learning to Teach with Dynamic Fused Metric L2T-DFM:利用动态融合指标学习教学
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2024-11-02 DOI: 10.1016/j.patcog.2024.111124
Zhaoyang Hai, Liyuan Pan, Xiabi Liu, Mengqiao Han
{"title":"L2T-DFM: Learning to Teach with Dynamic Fused Metric","authors":"Zhaoyang Hai,&nbsp;Liyuan Pan,&nbsp;Xiabi Liu,&nbsp;Mengqiao Han","doi":"10.1016/j.patcog.2024.111124","DOIUrl":"10.1016/j.patcog.2024.111124","url":null,"abstract":"<div><div>The loss function plays a crucial role in the construction of machine learning algorithms. Employing a teacher model to set loss functions dynamically for student models has attracted attention. In existing works, (1) the characterization of the dynamic loss suffers from some inherent limitations, <em>ie</em>, the computational cost of loss networks and the restricted similarity measurement handcrafted loss functions; and (2) the states of the student model are provided to the teacher model directly without integration, causing the teacher model to underperform when trained on insufficient amounts of data. To alleviate the above-mentioned issues, in this paper, we select and weigh a set of similarity metrics by a confidence-based selection algorithm and a temporal teacher model to enhance the dynamic loss functions. Subsequently, to integrate the states of the student model, we employ statistics to quantify the information loss of the student model. Extensive experiments demonstrate that our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, object detection, and semantic segmentation scenarios.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111124"},"PeriodicalIF":7.5,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信