{"title":"Active Learning Strategies Based on Text Informativeness","authors":"Ruide Li, Yoko Yamakata, Keishi Tajima","doi":"10.1109/WI-IAT55865.2022.00015","DOIUrl":null,"url":null,"abstract":"In this paper, we propose strategies for selecting the next item to label in active learning for text data. Text data have several text-specific features, such as TF-IDF vectors and document embeddings. These features have correlation with the informativeness of the text data, so our methods select the next item to label by using these text-specific features. We evaluate the performance of our strategies in two problem settings: the standard active learning setting, where we focus on the improvement of the model accuracy, and the learning-to-enumerate setting, where we focus on the efficiency in enumerating all instances of a given target class. We also combine our strategies with two existing strategies: uncertainty sampling, a well-known strategy for active learning, and the exploitation-only strategy, a strategy used in learning-to-enumerate problems. Our experiment on two publicly available English text datasets show that our method outperforms the baseline methods in both problem settings.","PeriodicalId":345445,"journal":{"name":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","volume":"308 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI-IAT55865.2022.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose strategies for selecting the next item to label in active learning for text data. Text data have several text-specific features, such as TF-IDF vectors and document embeddings. These features have correlation with the informativeness of the text data, so our methods select the next item to label by using these text-specific features. We evaluate the performance of our strategies in two problem settings: the standard active learning setting, where we focus on the improvement of the model accuracy, and the learning-to-enumerate setting, where we focus on the efficiency in enumerating all instances of a given target class. We also combine our strategies with two existing strategies: uncertainty sampling, a well-known strategy for active learning, and the exploitation-only strategy, a strategy used in learning-to-enumerate problems. Our experiment on two publicly available English text datasets show that our method outperforms the baseline methods in both problem settings.