{"title":"通过对样本的探索增强主动域适应能力","authors":"Qing Tian, Heng Zhang","doi":"10.3233/ida-230150","DOIUrl":null,"url":null,"abstract":"Nowadays, the idea of active learning is gradually adopted to assist domain adaptation. However, due to the existence of domain shift, the traditional active learning methods originating from semi-supervised scenarios can not be directly applied to domain adaptation. To solve the problem, active domain adaptation is proposed as a new domain adaptation paradigm, which aims to improve the performance of the model by annotating a small amount of target domain samples. In this regard, we propose an active domain adaptation method named Boosting Active Domain Adaptation with Exploration of Samples (BADA), dividing Active DA into two related issues: sample selection and sample utilization. We design the instability selection criterion based on predictive consistency and the diversity selection criterion. For the remaining unlabeled samples, we design a self-training framework, which screens out reliable samples and unreliable samples through the sample screening mechanism similar to selection criteria. And we adopt respective loss functions for reliable samples and unreliable samples. Experiments show that BADA remarkably outperforms previous active learning methods and Active DA methods on several domain adaptation datasets.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":"1 1","pages":"0"},"PeriodicalIF":0.9000,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Boosting active domain adaptation with exploration of samples\",\"authors\":\"Qing Tian, Heng Zhang\",\"doi\":\"10.3233/ida-230150\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, the idea of active learning is gradually adopted to assist domain adaptation. However, due to the existence of domain shift, the traditional active learning methods originating from semi-supervised scenarios can not be directly applied to domain adaptation. To solve the problem, active domain adaptation is proposed as a new domain adaptation paradigm, which aims to improve the performance of the model by annotating a small amount of target domain samples. In this regard, we propose an active domain adaptation method named Boosting Active Domain Adaptation with Exploration of Samples (BADA), dividing Active DA into two related issues: sample selection and sample utilization. We design the instability selection criterion based on predictive consistency and the diversity selection criterion. For the remaining unlabeled samples, we design a self-training framework, which screens out reliable samples and unreliable samples through the sample screening mechanism similar to selection criteria. And we adopt respective loss functions for reliable samples and unreliable samples. Experiments show that BADA remarkably outperforms previous active learning methods and Active DA methods on several domain adaptation datasets.\",\"PeriodicalId\":50355,\"journal\":{\"name\":\"Intelligent Data Analysis\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2023-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Data Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/ida-230150\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Data Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/ida-230150","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
目前,人们逐渐采用主动学习的思想来辅助领域适应。然而,由于领域漂移的存在,源自半监督场景的传统主动学习方法不能直接应用于领域自适应。为了解决这一问题,提出了一种新的领域自适应范式——主动域自适应,该范式旨在通过标注少量目标域样本来提高模型的性能。为此,我们提出了一种主动域自适应方法——Boosting active domain adaptation with Exploration of Samples (BADA),将主动域自适应分为两个相关问题:样本选择和样本利用。我们设计了基于预测一致性和多样性选择准则的不稳定性选择准则。对于剩余的未标记样本,我们设计了一个自我训练框架,通过类似于选择标准的样本筛选机制筛选出可靠样本和不可靠样本。对可靠样本和不可靠样本分别采用损失函数。实验表明,在多个领域自适应数据集上,BADA显著优于以往的主动学习方法和主动数据挖掘方法。
Boosting active domain adaptation with exploration of samples
Nowadays, the idea of active learning is gradually adopted to assist domain adaptation. However, due to the existence of domain shift, the traditional active learning methods originating from semi-supervised scenarios can not be directly applied to domain adaptation. To solve the problem, active domain adaptation is proposed as a new domain adaptation paradigm, which aims to improve the performance of the model by annotating a small amount of target domain samples. In this regard, we propose an active domain adaptation method named Boosting Active Domain Adaptation with Exploration of Samples (BADA), dividing Active DA into two related issues: sample selection and sample utilization. We design the instability selection criterion based on predictive consistency and the diversity selection criterion. For the remaining unlabeled samples, we design a self-training framework, which screens out reliable samples and unreliable samples through the sample screening mechanism similar to selection criteria. And we adopt respective loss functions for reliable samples and unreliable samples. Experiments show that BADA remarkably outperforms previous active learning methods and Active DA methods on several domain adaptation datasets.
期刊介绍:
Intelligent Data Analysis provides a forum for the examination of issues related to the research and applications of Artificial Intelligence techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to): all areas of data visualization, data pre-processing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, big data applications, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and post-processing. In particular, papers are preferred that discuss development of new AI related data analysis architectures, methodologies, and techniques and their applications to various domains.