Active Learning Strategies Based on Text Informativeness

2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT) Pub Date : 2022-11-01 DOI:10.1109/WI-IAT55865.2022.00015

Ruide Li, Yoko Yamakata, Keishi Tajima

{"title":"Active Learning Strategies Based on Text Informativeness","authors":"Ruide Li, Yoko Yamakata, Keishi Tajima","doi":"10.1109/WI-IAT55865.2022.00015","DOIUrl":null,"url":null,"abstract":"In this paper, we propose strategies for selecting the next item to label in active learning for text data. Text data have several text-specific features, such as TF-IDF vectors and document embeddings. These features have correlation with the informativeness of the text data, so our methods select the next item to label by using these text-specific features. We evaluate the performance of our strategies in two problem settings: the standard active learning setting, where we focus on the improvement of the model accuracy, and the learning-to-enumerate setting, where we focus on the efficiency in enumerating all instances of a given target class. We also combine our strategies with two existing strategies: uncertainty sampling, a well-known strategy for active learning, and the exploitation-only strategy, a strategy used in learning-to-enumerate problems. Our experiment on two publicly available English text datasets show that our method outperforms the baseline methods in both problem settings.","PeriodicalId":345445,"journal":{"name":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","volume":"308 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT55865.2022.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, we propose strategies for selecting the next item to label in active learning for text data. Text data have several text-specific features, such as TF-IDF vectors and document embeddings. These features have correlation with the informativeness of the text data, so our methods select the next item to label by using these text-specific features. We evaluate the performance of our strategies in two problem settings: the standard active learning setting, where we focus on the improvement of the model accuracy, and the learning-to-enumerate setting, where we focus on the efficiency in enumerating all instances of a given target class. We also combine our strategies with two existing strategies: uncertainty sampling, a well-known strategy for active learning, and the exploitation-only strategy, a strategy used in learning-to-enumerate problems. Our experiment on two publicly available English text datasets show that our method outperforms the baseline methods in both problem settings.

查看原文本刊更多论文

基于文本信息性的主动学习策略

在本文中，我们提出了在文本数据的主动学习中选择下一个要标记的项目的策略。文本数据有几个特定于文本的特性，比如TF-IDF向量和文档嵌入。这些特征与文本数据的信息量相关，因此我们的方法通过使用这些特定于文本的特征来选择下一个要标记的项目。我们在两个问题设置中评估我们的策略的性能:标准主动学习设置，我们关注模型准确性的提高，以及学习枚举设置，我们关注枚举给定目标类的所有实例的效率。我们还将我们的策略与两种现有策略相结合:不确定性抽样，一种众所周知的主动学习策略，以及仅利用策略，一种用于学习枚举问题的策略。我们在两个公开可用的英语文本数据集上的实验表明，我们的方法在两个问题设置中都优于基线方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)

自引率

0.00%

发文量