Sentiment Analysis of Reviews in Kazakh With Transfer Learning Techniques

A. Nugumanova, Y. Baiburin, Yermek Alimzhanov
{"title":"Sentiment Analysis of Reviews in Kazakh With Transfer Learning Techniques","authors":"A. Nugumanova, Y. Baiburin, Yermek Alimzhanov","doi":"10.1109/SIST54437.2022.9945811","DOIUrl":null,"url":null,"abstract":"Heavily pretrained transformer models, such as Bidirectional Encoder Representations from Transformers (BERT) or Generative Pre-trained Transformer (GPT), have successfully demonstrated the superior ability to recognize the right sentiments of texts in English or other dominant languages. However, for low-resource languages such as Kazakh, there are no similar models due to the high computational and memory requirements for their training and the lack of labeled datasets. Under this circumstance, transfer learning can be applied to low-resource language using a pretrained multilingual or related-language model. In this paper, we consider two ways to implement the transfer learning strategy: zero-shot learning and fine-tuning. We design experiments to compare these two methods and report the obtained results. Experiments show that in both cases BERT-based multilingual sentiment analysis model performs better than the BERT-based model for Turkish language, and the performance of these models grows after fine-tuning even with a very small number of samples in Kazakh.","PeriodicalId":207613,"journal":{"name":"2022 International Conference on Smart Information Systems and Technologies (SIST)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Smart Information Systems and Technologies (SIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIST54437.2022.9945811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Heavily pretrained transformer models, such as Bidirectional Encoder Representations from Transformers (BERT) or Generative Pre-trained Transformer (GPT), have successfully demonstrated the superior ability to recognize the right sentiments of texts in English or other dominant languages. However, for low-resource languages such as Kazakh, there are no similar models due to the high computational and memory requirements for their training and the lack of labeled datasets. Under this circumstance, transfer learning can be applied to low-resource language using a pretrained multilingual or related-language model. In this paper, we consider two ways to implement the transfer learning strategy: zero-shot learning and fine-tuning. We design experiments to compare these two methods and report the obtained results. Experiments show that in both cases BERT-based multilingual sentiment analysis model performs better than the BERT-based model for Turkish language, and the performance of these models grows after fine-tuning even with a very small number of samples in Kazakh.
用迁移学习技术分析哈萨克语评论的情感
大量预训练的变压器模型,如变形金刚的双向编码器表示(BERT)或生成预训练的变形金刚(GPT),已经成功地证明了识别英语或其他主要语言文本的正确情感的卓越能力。然而,对于像哈萨克语这样的低资源语言,由于其训练的高计算和内存要求以及缺乏标记数据集,没有类似的模型。在这种情况下,迁移学习可以使用预训练的多语言或相关语言模型应用于低资源语言。在本文中,我们考虑了两种实现迁移学习策略的方法:零学习和微调。我们设计实验来比较这两种方法,并报告得到的结果。实验表明,在这两种情况下,基于bert的多语言情感分析模型都比基于bert的土耳其语情感分析模型表现得更好,即使在哈萨克语的样本数量很少的情况下,这些模型的性能也在微调后有所提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信