{"title":"Sentiment Analysis of Reviews in Kazakh With Transfer Learning Techniques","authors":"A. Nugumanova, Y. Baiburin, Yermek Alimzhanov","doi":"10.1109/SIST54437.2022.9945811","DOIUrl":null,"url":null,"abstract":"Heavily pretrained transformer models, such as Bidirectional Encoder Representations from Transformers (BERT) or Generative Pre-trained Transformer (GPT), have successfully demonstrated the superior ability to recognize the right sentiments of texts in English or other dominant languages. However, for low-resource languages such as Kazakh, there are no similar models due to the high computational and memory requirements for their training and the lack of labeled datasets. Under this circumstance, transfer learning can be applied to low-resource language using a pretrained multilingual or related-language model. In this paper, we consider two ways to implement the transfer learning strategy: zero-shot learning and fine-tuning. We design experiments to compare these two methods and report the obtained results. Experiments show that in both cases BERT-based multilingual sentiment analysis model performs better than the BERT-based model for Turkish language, and the performance of these models grows after fine-tuning even with a very small number of samples in Kazakh.","PeriodicalId":207613,"journal":{"name":"2022 International Conference on Smart Information Systems and Technologies (SIST)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Smart Information Systems and Technologies (SIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIST54437.2022.9945811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Heavily pretrained transformer models, such as Bidirectional Encoder Representations from Transformers (BERT) or Generative Pre-trained Transformer (GPT), have successfully demonstrated the superior ability to recognize the right sentiments of texts in English or other dominant languages. However, for low-resource languages such as Kazakh, there are no similar models due to the high computational and memory requirements for their training and the lack of labeled datasets. Under this circumstance, transfer learning can be applied to low-resource language using a pretrained multilingual or related-language model. In this paper, we consider two ways to implement the transfer learning strategy: zero-shot learning and fine-tuning. We design experiments to compare these two methods and report the obtained results. Experiments show that in both cases BERT-based multilingual sentiment analysis model performs better than the BERT-based model for Turkish language, and the performance of these models grows after fine-tuning even with a very small number of samples in Kazakh.