An Adaptive Pre-filtering Technique for Error-Reduction Sampling in Active Learning

2008 IEEE International Conference on Data Mining Workshops Pub Date : 2008-12-15 DOI:10.1109/ICDMW.2008.52

Michael Davy, S. Luz

{"title":"An Adaptive Pre-filtering Technique for Error-Reduction Sampling in Active Learning","authors":"Michael Davy, S. Luz","doi":"10.1109/ICDMW.2008.52","DOIUrl":null,"url":null,"abstract":"Error-reduction sampling (ERS) is a high performing (but computationally expensive) query selection strategy for active learning. Subset optimisation has been proposed to reduce computational expense by applying ERS to only a subset of examples from the pool. This paper compares techniques used to construct the subset, namely random sub-sampling and pre-filtering. We focus on pre-filtering which populates the subset with more informative examples by filtering from the unlabelled pool using a query selection strategy. In this paper we establish whether pre-filtering outperforms sub-sampling optimisation, examine the effect of subset size, and propose a novel adaptive pre-filtering technique which dynamically switches between several alternative pre-filtering techniques using a multi-armed bandit algorithm. Empirical evaluations conducted on two benchmark text categorisation datasets demonstrate that pre-filtered ERS achieve higher levels of accuracy when compared to sub-sampled ERS. The proposed adaptive pre-filtering technique is also shown to be competitive with the optimal pre-filtering technique on the majority of problems and is never the worst technique.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2008.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Error-reduction sampling (ERS) is a high performing (but computationally expensive) query selection strategy for active learning. Subset optimisation has been proposed to reduce computational expense by applying ERS to only a subset of examples from the pool. This paper compares techniques used to construct the subset, namely random sub-sampling and pre-filtering. We focus on pre-filtering which populates the subset with more informative examples by filtering from the unlabelled pool using a query selection strategy. In this paper we establish whether pre-filtering outperforms sub-sampling optimisation, examine the effect of subset size, and propose a novel adaptive pre-filtering technique which dynamically switches between several alternative pre-filtering techniques using a multi-armed bandit algorithm. Empirical evaluations conducted on two benchmark text categorisation datasets demonstrate that pre-filtered ERS achieve higher levels of accuracy when compared to sub-sampled ERS. The proposed adaptive pre-filtering technique is also shown to be competitive with the optimal pre-filtering technique on the majority of problems and is never the worst technique.

查看原文本刊更多论文

主动学习中误差减小采样的自适应预滤波技术

差错减少抽样(ERS)是一种高性能(但计算代价昂贵)的主动学习查询选择策略。子集优化已被提出，通过将ERS仅应用于池中的一个子集来减少计算费用。本文比较了用于构建子集的技术，即随机子采样和预滤波。我们将重点放在预过滤上，通过使用查询选择策略从未标记的池中过滤，用更多信息丰富的示例填充子集。在本文中，我们确定了预滤波是否优于子采样优化，检查了子集大小的影响，并提出了一种新的自适应预滤波技术，该技术使用多臂强盗算法在几种备选预滤波技术之间动态切换。在两个基准文本分类数据集上进行的经验评估表明，与子采样的ERS相比，预过滤的ERS具有更高的准确性。所提出的自适应预滤波技术在大多数问题上都能与最优预滤波技术相竞争，而不是最差的预滤波技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量