Gradient-guided channel masking for cross-domain few-shot learning

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2024-10-09 DOI:10.1016/j.knosys.2024.112548

Siqi Hui , Sanping Zhou , Ye Deng , Yang Wu , Jinjun Wang

{"title":"Gradient-guided channel masking for cross-domain few-shot learning","authors":"Siqi Hui , Sanping Zhou , Ye Deng , Yang Wu , Jinjun Wang","doi":"10.1016/j.knosys.2024.112548","DOIUrl":null,"url":null,"abstract":"<div><div>Cross-Domain Few-Shot Learning (CD-FSL) addresses the Few-Shot Learning with a domain gap between source and target domains, which facilitates the transfer of knowledge from a source domain to a target domain with limited labeled samples. Current approaches often incorporate an auxiliary target dataset containing a few labeled samples to enhance model generalization on specific target domains. However, we observe that many models retain a substantial number of channels that learn source-specific knowledge and extract features that perform adequately on the source domain but generalize poorly to the target domain. This often results in compromised performance due to the influence of source-specific knowledge. To address this challenge, we introduce a novel framework, Gradient-Guided Channel Masking (GGCM), designed for CD-FSL to mitigate model channels from acquiring too much source-specific knowledge. GGCM quantifies each channel’s contribution to solving target tasks using gradients of target loss and identifies those with smaller gradients as source-specific. These channels are then masked during the forward propagation of source features to mitigate the learning of source-specific knowledge. Conversely, GGCM mutes non-source-specific channels during the forward propagation of target features, forcing the model to depend on the source-specific channels and thereby enhancing their generalizability. Moreover, we propose a consistency loss that aligns the predictions made by source-specific channels with those made by the entire model. This approach further enhances the generalizability of these channels by enabling them to learn from the generalizable knowledge contained in other non-source-specific channels. Validated across multiple CD-FSL benchmark datasets, our framework demonstrates state-of-the-art performance and effectively suppresses the learning of source-specific knowledge.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124011821","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Cross-Domain Few-Shot Learning (CD-FSL) addresses the Few-Shot Learning with a domain gap between source and target domains, which facilitates the transfer of knowledge from a source domain to a target domain with limited labeled samples. Current approaches often incorporate an auxiliary target dataset containing a few labeled samples to enhance model generalization on specific target domains. However, we observe that many models retain a substantial number of channels that learn source-specific knowledge and extract features that perform adequately on the source domain but generalize poorly to the target domain. This often results in compromised performance due to the influence of source-specific knowledge. To address this challenge, we introduce a novel framework, Gradient-Guided Channel Masking (GGCM), designed for CD-FSL to mitigate model channels from acquiring too much source-specific knowledge. GGCM quantifies each channel’s contribution to solving target tasks using gradients of target loss and identifies those with smaller gradients as source-specific. These channels are then masked during the forward propagation of source features to mitigate the learning of source-specific knowledge. Conversely, GGCM mutes non-source-specific channels during the forward propagation of target features, forcing the model to depend on the source-specific channels and thereby enhancing their generalizability. Moreover, we propose a consistency loss that aligns the predictions made by source-specific channels with those made by the entire model. This approach further enhances the generalizability of these channels by enabling them to learn from the generalizable knowledge contained in other non-source-specific channels. Validated across multiple CD-FSL benchmark datasets, our framework demonstrates state-of-the-art performance and effectively suppresses the learning of source-specific knowledge.

查看原文本刊更多论文

梯度引导的通道掩蔽，用于跨域少量学习

跨域快速学习（Cross-Domain Few-Shot Learning，CD-FSL）解决了源域和目标域之间存在域差距的快速学习（Few-Shot Learning）问题，这有利于将知识从源域转移到标注样本有限的目标域。目前的方法通常会加入一个包含少量标注样本的辅助目标数据集，以增强模型在特定目标领域的泛化能力。然而，我们注意到，许多模型保留了大量学习源特定知识的通道，并提取了在源领域表现良好但在目标领域泛化不佳的特征。由于特定来源知识的影响，这往往会导致性能大打折扣。为了应对这一挑战，我们引入了一个新颖的框架--梯度引导通道屏蔽（GGCM），该框架专为 CD-FSL 设计，以减少模型通道获取过多源特定知识的情况。GGCM 利用目标损失梯度量化每个通道对解决目标任务的贡献，并将梯度较小的通道识别为源特定通道。然后在源特征的前向传播过程中屏蔽这些通道，以减少源特定知识的学习。相反，GGCM 会在目标特征的前向传播过程中屏蔽非源特定通道，迫使模型依赖于源特定通道，从而增强其通用性。此外，我们还提出了一种一致性损失（consistency loss），使特定来源通道的预测与整个模型的预测保持一致。这种方法使这些通道能够从其他非特定源通道中包含的可通用知识中学习，从而进一步增强了这些通道的通用性。经过多个 CD-FSL 基准数据集的验证，我们的框架展示了最先进的性能，并有效抑制了特定来源知识的学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.