The Impact of Active Learning Algorithm on a Cross-lingual model in a Persian Sentiment Task

Monire Shirghasemi, M. Bokaei, M. Bijankhan
{"title":"The Impact of Active Learning Algorithm on a Cross-lingual model in a Persian Sentiment Task","authors":"Monire Shirghasemi, M. Bokaei, M. Bijankhan","doi":"10.1109/ICWR51868.2021.9443156","DOIUrl":null,"url":null,"abstract":"One of the most challenging problems that we may face in natural language processing tasks is the lack of annotated training datasets. In this paper our goal is to consider the impact of Active Learning algorithm on a cross-lingual model in sentiment analysis task on Persian language which is known as a low-resource language. Cross-lingual model trains a model by using a rich-resource language like English as a source language and apply it to a low-resource language, in this way the dependency to training datasets is decreased. Also using Active Learning strategy helps us to improve the functionality of our model by selecting most representative samples. Since labeling data is expensive and time consuming, by selecting the machine desirable data we can reduce the amount of labeled data required for our tasks. To do this we can select data which classifier is the least confident about them. When they are chosen, a user is asked to labeled them. There are lots of methods and factors to choose the appropriate data for Active Learning strategy. In the end these methods help our classifier to gain more knowledge about samples and work more properly.","PeriodicalId":377597,"journal":{"name":"2021 7th International Conference on Web Research (ICWR)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR51868.2021.9443156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

One of the most challenging problems that we may face in natural language processing tasks is the lack of annotated training datasets. In this paper our goal is to consider the impact of Active Learning algorithm on a cross-lingual model in sentiment analysis task on Persian language which is known as a low-resource language. Cross-lingual model trains a model by using a rich-resource language like English as a source language and apply it to a low-resource language, in this way the dependency to training datasets is decreased. Also using Active Learning strategy helps us to improve the functionality of our model by selecting most representative samples. Since labeling data is expensive and time consuming, by selecting the machine desirable data we can reduce the amount of labeled data required for our tasks. To do this we can select data which classifier is the least confident about them. When they are chosen, a user is asked to labeled them. There are lots of methods and factors to choose the appropriate data for Active Learning strategy. In the end these methods help our classifier to gain more knowledge about samples and work more properly.
主动学习算法对波斯语情感任务跨语言模型的影响
我们在自然语言处理任务中可能面临的最具挑战性的问题之一是缺乏带注释的训练数据集。在本文中,我们的目标是考虑主动学习算法对跨语言模型在波斯语这种低资源语言的情感分析任务中的影响。跨语言模型通过使用资源丰富的语言(如英语)作为源语言并将其应用于资源较低的语言来训练模型,从而降低了对训练数据集的依赖性。此外,使用主动学习策略可以帮助我们通过选择最具代表性的样本来改进模型的功能。由于标记数据昂贵且耗时,通过选择机器所需的数据,我们可以减少任务所需的标记数据量。为了做到这一点,我们可以选择对它们最不自信的分类器。当选择它们时,要求用户标记它们。选择合适的数据进行主动学习策略有很多方法和因素。最后,这些方法帮助我们的分类器获得更多关于样本的知识,并更正确地工作。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信