Partial Domain Adaptation via Importance Sampling-Based Shift Correction

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-01 DOI:10.1109/TIP.2025.3593115

Cheng-Jun Guo;Chuan-Xian Ren;You-Wei Luo;Xiao-Lin Xu;Hong Yan

{"title":"Partial Domain Adaptation via Importance Sampling-Based Shift Correction","authors":"Cheng-Jun Guo;Chuan-Xian Ren;You-Wei Luo;Xiao-Lin Xu;Hong Yan","doi":"10.1109/TIP.2025.3593115","DOIUrl":null,"url":null,"abstract":"Partial domain adaptation (PDA) is a challenging task in real-world machine learning scenarios. It aims to transfer knowledge from a labeled source domain to a related unlabeled target domain, where the support set of the source label distribution subsumes the target one. Previous PDA works managed to correct the label distribution shift by weighting samples in the source domain. However, the simple reweighing technique cannot explore the latent structure and sufficiently use the labeled data, and then models are prone to over-fitting on the source domain. In this work, we propose a novel importance sampling-based shift correction (IS2C) method, where new labeled data are sampled from a built sampling domain, whose label distribution is supposed to be the same as the target domain, to characterize the latent structure and enhance the generalization ability of the model. We provide theoretical guarantees for IS2C by proving that the generalization error can be sufficiently dominated by IS2C. In particular, by implementing sampling with the mixture distribution, the extent of shift between source and sampling domains can be connected to generalization error, which provides an interpretable way to build IS2C. To improve knowledge transfer, an optimal transport-based independence criterion is proposed for conditional distribution alignment, where the computation of the criterion can be adjusted to reduce the complexity from <inline-formula> <tex-math>$\\mathcal {O}(n^{3})$ </tex-math></inline-formula> to <inline-formula> <tex-math>$\\mathcal {O}(n^{2})$ </tex-math></inline-formula> in realistic PDA scenarios. Extensive experiments on PDA benchmarks validate the theoretical results and demonstrate the effectiveness of our IS2C over existing methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5009-5022"},"PeriodicalIF":13.7000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11107265/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Partial domain adaptation (PDA) is a challenging task in real-world machine learning scenarios. It aims to transfer knowledge from a labeled source domain to a related unlabeled target domain, where the support set of the source label distribution subsumes the target one. Previous PDA works managed to correct the label distribution shift by weighting samples in the source domain. However, the simple reweighing technique cannot explore the latent structure and sufficiently use the labeled data, and then models are prone to over-fitting on the source domain. In this work, we propose a novel importance sampling-based shift correction (IS2C) method, where new labeled data are sampled from a built sampling domain, whose label distribution is supposed to be the same as the target domain, to characterize the latent structure and enhance the generalization ability of the model. We provide theoretical guarantees for IS2C by proving that the generalization error can be sufficiently dominated by IS2C. In particular, by implementing sampling with the mixture distribution, the extent of shift between source and sampling domains can be connected to generalization error, which provides an interpretable way to build IS2C. To improve knowledge transfer, an optimal transport-based independence criterion is proposed for conditional distribution alignment, where the computation of the criterion can be adjusted to reduce the complexity from

$\mathcal {O}(n^{3})$

$\mathcal {O}(n^{2})$

in realistic PDA scenarios. Extensive experiments on PDA benchmarks validate the theoretical results and demonstrate the effectiveness of our IS2C over existing methods.

查看原文本刊更多论文

基于重要采样偏移校正的局部域自适应

在现实世界的机器学习场景中，部分领域自适应（PDA）是一项具有挑战性的任务。它旨在将知识从已标记的源领域转移到相关的未标记的目标领域，其中源标签分布的支持集包含目标领域。以前的PDA工作是通过对源域中的样本加权来纠正标签分布的偏移。然而，简单的重加权技术不能充分挖掘潜在结构和利用标记数据，模型容易在源域上过度拟合。在这项工作中，我们提出了一种新的基于重要性采样的移位校正（IS2C）方法，该方法从构建的采样域中采样新的标记数据，该采样域的标签分布应该与目标域相同，以表征潜在结构并增强模型的泛化能力。我们通过证明IS2C可以充分控制泛化误差，为IS2C提供了理论保证。特别是，通过使用混合分布实现采样，源域和采样域之间的偏移程度可以与泛化误差联系起来，这为构建IS2C提供了一种可解释的方法。为了提高知识转移，提出了一种最优的基于传输的条件分布对齐独立准则，该准则的计算可以调整，将实际PDA场景下的条件分布对齐复杂度从$\mathcal {O}(n^{3})$降低到$\mathcal {O}(n^{2})$。在PDA基准上进行的大量实验验证了理论结果，并证明了我们的IS2C优于现有方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量