{"title":"Unsupervised Domain Adaptation Via Data Pruning","authors":"Andrea Napoli, Paul White","doi":"arxiv-2409.12076","DOIUrl":null,"url":null,"abstract":"The removal of carefully-selected examples from training data has recently\nemerged as an effective way of improving the robustness of machine learning\nmodels. However, the best way to select these examples remains an open\nquestion. In this paper, we consider the problem from the perspective of\nunsupervised domain adaptation (UDA). We propose AdaPrune, a method for UDA\nwhereby training examples are removed to attempt to align the training\ndistribution to that of the target data. By adopting the maximum mean\ndiscrepancy (MMD) as the criterion for alignment, the problem can be neatly\nformulated and solved as an integer quadratic program. We evaluate our approach\non a real-world domain shift task of bioacoustic event detection. As a method\nfor UDA, we show that AdaPrune outperforms related techniques, and is\ncomplementary to other UDA algorithms such as CORAL. Our analysis of the\nrelationship between the MMD and model accuracy, along with t-SNE plots,\nvalidate the proposed method as a principled and well-founded way of performing\ndata pruning.","PeriodicalId":501301,"journal":{"name":"arXiv - CS - Machine Learning","volume":"35 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.12076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The removal of carefully-selected examples from training data has recently
emerged as an effective way of improving the robustness of machine learning
models. However, the best way to select these examples remains an open
question. In this paper, we consider the problem from the perspective of
unsupervised domain adaptation (UDA). We propose AdaPrune, a method for UDA
whereby training examples are removed to attempt to align the training
distribution to that of the target data. By adopting the maximum mean
discrepancy (MMD) as the criterion for alignment, the problem can be neatly
formulated and solved as an integer quadratic program. We evaluate our approach
on a real-world domain shift task of bioacoustic event detection. As a method
for UDA, we show that AdaPrune outperforms related techniques, and is
complementary to other UDA algorithms such as CORAL. Our analysis of the
relationship between the MMD and model accuracy, along with t-SNE plots,
validate the proposed method as a principled and well-founded way of performing
data pruning.