Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI:10.1109/ICDMW.2009.38

Zuobing Xu, Christopher Hogan, Robert S. Bauer

{"title":"Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm","authors":"Zuobing Xu, Christopher Hogan, Robert S. Bauer","doi":"10.1109/ICDMW.2009.38","DOIUrl":null,"url":null,"abstract":"Active learning algorithms actively select training examples to acquire labels from domain experts, which are very effective to reduce human labeling effort in the context of supervised learning. To reduce computational time in training, as well as provide more convenient user interaction environment, it is necessary to select batches of new training examples instead of a single example. Batch mode active learning algorithms incorporate a diversity measure to construct a batch of diversified candidate examples. Existing approaches use greedy algorithms to make it feasible to the scale of thousands of data. Greedy algorithms, however, are not efficient enough to scale to even larger real world classification applications, which contain millions of data. In this paper, we present an extremely efficient active learning algorithm. This new active learning algorithm achieves the same results as the traditional greedy algorithm, while the run time is reduced by a factor of several hundred times. We prove that the objective function of the algorithm is submodular, which guarantees to find the same solution as the greedy algorithm. We evaluate our approach on several largescale real-world text classification problems, and show that our new approach achieves substantial speedups, while obtaining the same classification accuracy.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2009.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Active learning algorithms actively select training examples to acquire labels from domain experts, which are very effective to reduce human labeling effort in the context of supervised learning. To reduce computational time in training, as well as provide more convenient user interaction environment, it is necessary to select batches of new training examples instead of a single example. Batch mode active learning algorithms incorporate a diversity measure to construct a batch of diversified candidate examples. Existing approaches use greedy algorithms to make it feasible to the scale of thousands of data. Greedy algorithms, however, are not efficient enough to scale to even larger real world classification applications, which contain millions of data. In this paper, we present an extremely efficient active learning algorithm. This new active learning algorithm achieves the same results as the traditional greedy algorithm, while the run time is reduced by a factor of several hundred times. We prove that the objective function of the algorithm is submodular, which guarantees to find the same solution as the greedy algorithm. We evaluate our approach on several largescale real-world text classification problems, and show that our new approach achieves substantial speedups, while obtaining the same classification accuracy.

查看原文本刊更多论文

贪心是不够的:一种高效的批处理模式主动学习算法

主动学习算法主动选择训练样例，从领域专家那里获取标签，这在监督学习环境下非常有效地减少了人工标注的工作量。为了减少训练中的计算时间，并提供更方便的用户交互环境，有必要选择批量的新训练样例而不是单个样例。批处理模式主动学习算法采用多样性度量来构造一批多样化的候选样例。现有的方法使用贪婪算法，使其在数千个数据的规模上可行。然而，贪心算法的效率不足以扩展到包含数百万数据的更大的现实世界分类应用程序。在本文中，我们提出了一种非常高效的主动学习算法。这种新的主动学习算法达到了与传统贪婪算法相同的效果，而运行时间却缩短了几百倍。证明了算法的目标函数是子模的，保证了算法的解与贪心算法的解相同。我们在几个大规模的现实世界文本分类问题上评估了我们的方法，并表明我们的新方法在获得相同分类精度的同时取得了实质性的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量