基于文本信息性的主动学习策略

Ruide Li, Yoko Yamakata, Keishi Tajima
{"title":"基于文本信息性的主动学习策略","authors":"Ruide Li, Yoko Yamakata, Keishi Tajima","doi":"10.1109/WI-IAT55865.2022.00015","DOIUrl":null,"url":null,"abstract":"In this paper, we propose strategies for selecting the next item to label in active learning for text data. Text data have several text-specific features, such as TF-IDF vectors and document embeddings. These features have correlation with the informativeness of the text data, so our methods select the next item to label by using these text-specific features. We evaluate the performance of our strategies in two problem settings: the standard active learning setting, where we focus on the improvement of the model accuracy, and the learning-to-enumerate setting, where we focus on the efficiency in enumerating all instances of a given target class. We also combine our strategies with two existing strategies: uncertainty sampling, a well-known strategy for active learning, and the exploitation-only strategy, a strategy used in learning-to-enumerate problems. Our experiment on two publicly available English text datasets show that our method outperforms the baseline methods in both problem settings.","PeriodicalId":345445,"journal":{"name":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","volume":"308 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Active Learning Strategies Based on Text Informativeness\",\"authors\":\"Ruide Li, Yoko Yamakata, Keishi Tajima\",\"doi\":\"10.1109/WI-IAT55865.2022.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose strategies for selecting the next item to label in active learning for text data. Text data have several text-specific features, such as TF-IDF vectors and document embeddings. These features have correlation with the informativeness of the text data, so our methods select the next item to label by using these text-specific features. We evaluate the performance of our strategies in two problem settings: the standard active learning setting, where we focus on the improvement of the model accuracy, and the learning-to-enumerate setting, where we focus on the efficiency in enumerating all instances of a given target class. We also combine our strategies with two existing strategies: uncertainty sampling, a well-known strategy for active learning, and the exploitation-only strategy, a strategy used in learning-to-enumerate problems. Our experiment on two publicly available English text datasets show that our method outperforms the baseline methods in both problem settings.\",\"PeriodicalId\":345445,\"journal\":{\"name\":\"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)\",\"volume\":\"308 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI-IAT55865.2022.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT55865.2022.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们提出了在文本数据的主动学习中选择下一个要标记的项目的策略。文本数据有几个特定于文本的特性,比如TF-IDF向量和文档嵌入。这些特征与文本数据的信息量相关,因此我们的方法通过使用这些特定于文本的特征来选择下一个要标记的项目。我们在两个问题设置中评估我们的策略的性能:标准主动学习设置,我们关注模型准确性的提高,以及学习枚举设置,我们关注枚举给定目标类的所有实例的效率。我们还将我们的策略与两种现有策略相结合:不确定性抽样,一种众所周知的主动学习策略,以及仅利用策略,一种用于学习枚举问题的策略。我们在两个公开可用的英语文本数据集上的实验表明,我们的方法在两个问题设置中都优于基线方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Active Learning Strategies Based on Text Informativeness
In this paper, we propose strategies for selecting the next item to label in active learning for text data. Text data have several text-specific features, such as TF-IDF vectors and document embeddings. These features have correlation with the informativeness of the text data, so our methods select the next item to label by using these text-specific features. We evaluate the performance of our strategies in two problem settings: the standard active learning setting, where we focus on the improvement of the model accuracy, and the learning-to-enumerate setting, where we focus on the efficiency in enumerating all instances of a given target class. We also combine our strategies with two existing strategies: uncertainty sampling, a well-known strategy for active learning, and the exploitation-only strategy, a strategy used in learning-to-enumerate problems. Our experiment on two publicly available English text datasets show that our method outperforms the baseline methods in both problem settings.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信