基于聚类-标记的少镜头学习方法及其在图像数据自动标注中的应用

ACM Journal of Data and Information Quality (JDIQ) Pub Date : 2022-03-04 DOI:10.1145/3491232

Renzhi Wu, Nilaksh Das, Sanya Chaba, Sakshi Gandhi, Duen Horng Chau, Xu Chu

{"title":"基于聚类-标记的少镜头学习方法及其在图像数据自动标注中的应用","authors":"Renzhi Wu, Nilaksh Das, Sanya Chaba, Sakshi Gandhi, Duen Horng Chau, Xu Chu","doi":"10.1145/3491232","DOIUrl":null,"url":null,"abstract":"Few-shot learning (FSL) aims at learning to generalize from only a small number of labeled examples for a given target task. Most current state-of-the-art FSL methods typically have two limitations. First, they usually require access to a source dataset (in a similar domain) with abundant labeled examples, which may not always be possible due to privacy concerns and copyright issues. Second, they typically do not offer any estimation of the generalization error on the target FSL task, because the handful of labeled examples must be used for training and cannot spare a validation subset. In this article, we propose a cluster-then-label approach to perform few-shot learning. Our approach does not require access to the labeled source dataset and provides an estimation of generalization error. We show empirically, on four benchmark datasets, that our approach provides competitive predictive performance to state-of-the-art FSL approaches and our generalization error estimation is accurate. Finally, we explore the application of our proposed method to automatic image data labeling. We compare our method with existing automatic data labeling systems. The end-to-end performance of our method outperforms the state-of-the-art automatic data labeling system Snuba by 26% and is only 7% away from the fully supervised upper bound.","PeriodicalId":299504,"journal":{"name":"ACM Journal of Data and Information Quality (JDIQ)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A Cluster-then-label Approach for Few-shot Learning with Application to Automatic Image Data Labeling\",\"authors\":\"Renzhi Wu, Nilaksh Das, Sanya Chaba, Sakshi Gandhi, Duen Horng Chau, Xu Chu\",\"doi\":\"10.1145/3491232\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Few-shot learning (FSL) aims at learning to generalize from only a small number of labeled examples for a given target task. Most current state-of-the-art FSL methods typically have two limitations. First, they usually require access to a source dataset (in a similar domain) with abundant labeled examples, which may not always be possible due to privacy concerns and copyright issues. Second, they typically do not offer any estimation of the generalization error on the target FSL task, because the handful of labeled examples must be used for training and cannot spare a validation subset. In this article, we propose a cluster-then-label approach to perform few-shot learning. Our approach does not require access to the labeled source dataset and provides an estimation of generalization error. We show empirically, on four benchmark datasets, that our approach provides competitive predictive performance to state-of-the-art FSL approaches and our generalization error estimation is accurate. Finally, we explore the application of our proposed method to automatic image data labeling. We compare our method with existing automatic data labeling systems. The end-to-end performance of our method outperforms the state-of-the-art automatic data labeling system Snuba by 26% and is only 7% away from the fully supervised upper bound.\",\"PeriodicalId\":299504,\"journal\":{\"name\":\"ACM Journal of Data and Information Quality (JDIQ)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Journal of Data and Information Quality (JDIQ)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3491232\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3491232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

少射学习(FSL)的目的是学习仅从少量标记的例子中对给定的目标任务进行泛化。目前最先进的FSL方法通常有两个限制。首先，它们通常需要访问具有大量标记示例的源数据集(在类似的领域中)，由于隐私问题和版权问题，这可能并不总是可能的。其次，它们通常不提供对目标FSL任务的泛化误差的任何估计，因为必须使用少量标记的示例进行训练，而不能腾出验证子集。在本文中，我们提出了一种聚类-标签方法来执行少镜头学习。我们的方法不需要访问标记的源数据集，并提供了泛化误差的估计。在四个基准数据集上，我们的经验表明，我们的方法比最先进的FSL方法提供了有竞争力的预测性能，我们的泛化误差估计是准确的。最后，探讨了该方法在图像数据自动标注中的应用。我们将我们的方法与现有的自动数据标记系统进行了比较。我们的方法的端到端性能比最先进的自动数据标记系统Snuba高出26%，距离完全监督的上限只有7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Cluster-then-label Approach for Few-shot Learning with Application to Automatic Image Data Labeling

Few-shot learning (FSL) aims at learning to generalize from only a small number of labeled examples for a given target task. Most current state-of-the-art FSL methods typically have two limitations. First, they usually require access to a source dataset (in a similar domain) with abundant labeled examples, which may not always be possible due to privacy concerns and copyright issues. Second, they typically do not offer any estimation of the generalization error on the target FSL task, because the handful of labeled examples must be used for training and cannot spare a validation subset. In this article, we propose a cluster-then-label approach to perform few-shot learning. Our approach does not require access to the labeled source dataset and provides an estimation of generalization error. We show empirically, on four benchmark datasets, that our approach provides competitive predictive performance to state-of-the-art FSL approaches and our generalization error estimation is accurate. Finally, we explore the application of our proposed method to automatic image data labeling. We compare our method with existing automatic data labeling systems. The end-to-end performance of our method outperforms the state-of-the-art automatic data labeling system Snuba by 26% and is only 7% away from the fully supervised upper bound.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Journal of Data and Information Quality (JDIQ)

自引率

0.00%

发文量