Exploring the Efficiency of Batch Active Learning for Human-in-the-Loop Relation Extraction

Companion Proceedings of the The Web Conference 2018 Pub Date : 2018-04-23 DOI:10.1145/3184558.3191546

Ismini Lourentzou, D. Gruhl, Steve Welch

{"title":"Exploring the Efficiency of Batch Active Learning for Human-in-the-Loop Relation Extraction","authors":"Ismini Lourentzou, D. Gruhl, Steve Welch","doi":"10.1145/3184558.3191546","DOIUrl":null,"url":null,"abstract":"Domain-specific relation extraction requires training data for supervised learning models, and thus, significant labeling effort. Distant supervision is often leveraged for creating large annotated corpora however these methods require handling the inherent noise. On the other hand, active learning approaches can reduce the annotation cost by selecting the most beneficial examples to label in order to learn a good model. The choice of examples can be performed sequentially, i.e. select one example in each iteration, or in batches, i.e. select a set of examples in each iteration. The optimization of the batch size is a practical problem faced in every real-world application of active learning, however it is often treated as a parameter decided in advance. In this work, we study the trade-off between model performance, the number of requested labels in a batch and the time spent in each round for real-time, domain specific relation extraction. Our results show that the use of an appropriate batch size produces competitive performance, even compared to a fully sequential strategy, while reducing the training time dramatically.","PeriodicalId":235572,"journal":{"name":"Companion Proceedings of the The Web Conference 2018","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Proceedings of the The Web Conference 2018","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3184558.3191546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Domain-specific relation extraction requires training data for supervised learning models, and thus, significant labeling effort. Distant supervision is often leveraged for creating large annotated corpora however these methods require handling the inherent noise. On the other hand, active learning approaches can reduce the annotation cost by selecting the most beneficial examples to label in order to learn a good model. The choice of examples can be performed sequentially, i.e. select one example in each iteration, or in batches, i.e. select a set of examples in each iteration. The optimization of the batch size is a practical problem faced in every real-world application of active learning, however it is often treated as a parameter decided in advance. In this work, we study the trade-off between model performance, the number of requested labels in a batch and the time spent in each round for real-time, domain specific relation extraction. Our results show that the use of an appropriate batch size produces competitive performance, even compared to a fully sequential strategy, while reducing the training time dramatically.

查看原文本刊更多论文

探索批量主动学习在人在环关系提取中的效率

特定领域的关系提取需要监督学习模型的训练数据，因此需要大量的标记工作。远程监督通常用于创建大型带注释的语料库，但是这些方法需要处理固有的噪声。另一方面，主动学习方法可以通过选择最有利的例子进行标记来学习一个好的模型，从而降低标注成本。例的选择可以按顺序进行，即每次迭代选择一个例，也可以分批进行，即每次迭代选择一组例。批大小的优化是主动学习的实际应用中面临的一个实际问题，但它通常被视为一个预先确定的参数。在这项工作中，我们研究了模型性能、批处理中请求标签的数量和每轮用于实时、特定领域关系提取的时间之间的权衡。我们的结果表明，即使与完全顺序的策略相比，使用适当的批大小也会产生具有竞争力的性能，同时大大减少了训练时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Companion Proceedings of the The Web Conference 2018

自引率

0.00%

发文量