Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm

Zuobing Xu, Christopher Hogan, Robert S. Bauer
{"title":"Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm","authors":"Zuobing Xu, Christopher Hogan, Robert S. Bauer","doi":"10.1109/ICDMW.2009.38","DOIUrl":null,"url":null,"abstract":"Active learning algorithms actively select training examples to acquire labels from domain experts, which are very effective to reduce human labeling effort in the context of supervised learning. To reduce computational time in training, as well as provide more convenient user interaction environment, it is necessary to select batches of new training examples instead of a single example. Batch mode active learning algorithms incorporate a diversity measure to construct a batch of diversified candidate examples. Existing approaches use greedy algorithms to make it feasible to the scale of thousands of data. Greedy algorithms, however, are not efficient enough to scale to even larger real world classification applications, which contain millions of data. In this paper, we present an extremely efficient active learning algorithm. This new active learning algorithm achieves the same results as the traditional greedy algorithm, while the run time is reduced by a factor of several hundred times. We prove that the objective function of the algorithm is submodular, which guarantees to find the same solution as the greedy algorithm. We evaluate our approach on several largescale real-world text classification problems, and show that our new approach achieves substantial speedups, while obtaining the same classification accuracy.","PeriodicalId":351078,"journal":{"name":"2009 IEEE International Conference on Data Mining Workshops","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2009.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Active learning algorithms actively select training examples to acquire labels from domain experts, which are very effective to reduce human labeling effort in the context of supervised learning. To reduce computational time in training, as well as provide more convenient user interaction environment, it is necessary to select batches of new training examples instead of a single example. Batch mode active learning algorithms incorporate a diversity measure to construct a batch of diversified candidate examples. Existing approaches use greedy algorithms to make it feasible to the scale of thousands of data. Greedy algorithms, however, are not efficient enough to scale to even larger real world classification applications, which contain millions of data. In this paper, we present an extremely efficient active learning algorithm. This new active learning algorithm achieves the same results as the traditional greedy algorithm, while the run time is reduced by a factor of several hundred times. We prove that the objective function of the algorithm is submodular, which guarantees to find the same solution as the greedy algorithm. We evaluate our approach on several largescale real-world text classification problems, and show that our new approach achieves substantial speedups, while obtaining the same classification accuracy.
贪心是不够的:一种高效的批处理模式主动学习算法
主动学习算法主动选择训练样例,从领域专家那里获取标签,这在监督学习环境下非常有效地减少了人工标注的工作量。为了减少训练中的计算时间,并提供更方便的用户交互环境,有必要选择批量的新训练样例而不是单个样例。批处理模式主动学习算法采用多样性度量来构造一批多样化的候选样例。现有的方法使用贪婪算法,使其在数千个数据的规模上可行。然而,贪心算法的效率不足以扩展到包含数百万数据的更大的现实世界分类应用程序。在本文中,我们提出了一种非常高效的主动学习算法。这种新的主动学习算法达到了与传统贪婪算法相同的效果,而运行时间却缩短了几百倍。证明了算法的目标函数是子模的,保证了算法的解与贪心算法的解相同。我们在几个大规模的现实世界文本分类问题上评估了我们的方法,并表明我们的新方法在获得相同分类精度的同时取得了实质性的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信