Sentiment Analysis of Reviews in Kazakh With Transfer Learning Techniques

2022 International Conference on Smart Information Systems and Technologies (SIST) Pub Date : 2022-04-28 DOI:10.1109/SIST54437.2022.9945811

A. Nugumanova, Y. Baiburin, Yermek Alimzhanov

{"title":"Sentiment Analysis of Reviews in Kazakh With Transfer Learning Techniques","authors":"A. Nugumanova, Y. Baiburin, Yermek Alimzhanov","doi":"10.1109/SIST54437.2022.9945811","DOIUrl":null,"url":null,"abstract":"Heavily pretrained transformer models, such as Bidirectional Encoder Representations from Transformers (BERT) or Generative Pre-trained Transformer (GPT), have successfully demonstrated the superior ability to recognize the right sentiments of texts in English or other dominant languages. However, for low-resource languages such as Kazakh, there are no similar models due to the high computational and memory requirements for their training and the lack of labeled datasets. Under this circumstance, transfer learning can be applied to low-resource language using a pretrained multilingual or related-language model. In this paper, we consider two ways to implement the transfer learning strategy: zero-shot learning and fine-tuning. We design experiments to compare these two methods and report the obtained results. Experiments show that in both cases BERT-based multilingual sentiment analysis model performs better than the BERT-based model for Turkish language, and the performance of these models grows after fine-tuning even with a very small number of samples in Kazakh.","PeriodicalId":207613,"journal":{"name":"2022 International Conference on Smart Information Systems and Technologies (SIST)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Smart Information Systems and Technologies (SIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIST54437.2022.9945811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Heavily pretrained transformer models, such as Bidirectional Encoder Representations from Transformers (BERT) or Generative Pre-trained Transformer (GPT), have successfully demonstrated the superior ability to recognize the right sentiments of texts in English or other dominant languages. However, for low-resource languages such as Kazakh, there are no similar models due to the high computational and memory requirements for their training and the lack of labeled datasets. Under this circumstance, transfer learning can be applied to low-resource language using a pretrained multilingual or related-language model. In this paper, we consider two ways to implement the transfer learning strategy: zero-shot learning and fine-tuning. We design experiments to compare these two methods and report the obtained results. Experiments show that in both cases BERT-based multilingual sentiment analysis model performs better than the BERT-based model for Turkish language, and the performance of these models grows after fine-tuning even with a very small number of samples in Kazakh.

查看原文本刊更多论文

用迁移学习技术分析哈萨克语评论的情感

大量预训练的变压器模型，如变形金刚的双向编码器表示(BERT)或生成预训练的变形金刚(GPT)，已经成功地证明了识别英语或其他主要语言文本的正确情感的卓越能力。然而，对于像哈萨克语这样的低资源语言，由于其训练的高计算和内存要求以及缺乏标记数据集，没有类似的模型。在这种情况下，迁移学习可以使用预训练的多语言或相关语言模型应用于低资源语言。在本文中，我们考虑了两种实现迁移学习策略的方法:零学习和微调。我们设计实验来比较这两种方法，并报告得到的结果。实验表明，在这两种情况下，基于bert的多语言情感分析模型都比基于bert的土耳其语情感分析模型表现得更好，即使在哈萨克语的样本数量很少的情况下，这些模型的性能也在微调后有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Conference on Smart Information Systems and Technologies (SIST)

自引率

0.00%

发文量