Exploring transformer models in the sentiment analysis task for the under-resource Bengali language

Md. Nesarul Hoque , Umme Salma , Md. Jamal Uddin , Md. Martuza Ahamad , Sakifa Aktar
{"title":"Exploring transformer models in the sentiment analysis task for the under-resource Bengali language","authors":"Md. Nesarul Hoque ,&nbsp;Umme Salma ,&nbsp;Md. Jamal Uddin ,&nbsp;Md. Martuza Ahamad ,&nbsp;Sakifa Aktar","doi":"10.1016/j.nlp.2024.100091","DOIUrl":null,"url":null,"abstract":"<div><p>In the sentiment analysis (SA) task, we can obtain a positive or negative-typed comment or feedback from an online user or a customer about any object, such as a movie, drama, food, and others. This user’s sentiment may positively impact various decision-making processes. In this regard, a lot of studies have been done on identifying sentiments from a text in high-resource languages like English. However, a small number of studies are detected in the under-resource Bengali language because of the unavailability of the benchmark corpus, limitations of text processing application software, and so on. Furthermore, there is still enough space to enhance the classification performance of the SA task. In this research, we experiment on a recognized Bengali dataset of 11,807 comments to find positive or negative sentiments. We employ five state-of-the-art transformer-based pretrained models, such as multilingual Bidirectional Encoder Representations from Transformers (mBERT), BanglaBERT, Bangla-Bert-Base, DistilmBERT, and XLM-RoBERTa-base (XLM-R-base), with tuning of the hyperparameters. After that, we propose a combined model named Transformer-ensemble that presents outstanding detection performance with an accuracy of 95.97% and an F1-score of 95.96% compared to the existing recent methods in the Bengali SA task.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100091"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000396/pdfft?md5=224e6ebbfc8811318218e54f481e4c76&pid=1-s2.0-S2949719124000396-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the sentiment analysis (SA) task, we can obtain a positive or negative-typed comment or feedback from an online user or a customer about any object, such as a movie, drama, food, and others. This user’s sentiment may positively impact various decision-making processes. In this regard, a lot of studies have been done on identifying sentiments from a text in high-resource languages like English. However, a small number of studies are detected in the under-resource Bengali language because of the unavailability of the benchmark corpus, limitations of text processing application software, and so on. Furthermore, there is still enough space to enhance the classification performance of the SA task. In this research, we experiment on a recognized Bengali dataset of 11,807 comments to find positive or negative sentiments. We employ five state-of-the-art transformer-based pretrained models, such as multilingual Bidirectional Encoder Representations from Transformers (mBERT), BanglaBERT, Bangla-Bert-Base, DistilmBERT, and XLM-RoBERTa-base (XLM-R-base), with tuning of the hyperparameters. After that, we propose a combined model named Transformer-ensemble that presents outstanding detection performance with an accuracy of 95.97% and an F1-score of 95.96% compared to the existing recent methods in the Bengali SA task.

探索资源不足的孟加拉语情感分析任务中的转换器模型
在情感分析(SA)任务中,我们可以从在线用户或客户那里获得对任何对象(如电影、电视剧、食品等)的正面或负面评论或反馈。这种用户情感可能会对各种决策过程产生积极影响。在这方面,已经有很多关于从英语等高资源语言文本中识别情感的研究。然而,由于缺乏基准语料库、文本处理应用软件的局限性等原因,对资源不足的孟加拉语进行的研究为数不多。此外,提高 SA 任务的分类性能仍有足够的空间。在本研究中,我们在一个包含 11807 条评论的公认孟加拉语数据集上进行了实验,以发现积极或消极情绪。我们采用了五种最先进的基于变换器的预训练模型,如多语种变换器双向编码器表示(mBERT)、BanglaBERT、Bangla-Bert-Base、DistilmBERT 和 XLM-RoBERTa-base (XLM-R-base),并对超参数进行了调整。之后,我们提出了一个名为 Transformer-ensemble 的组合模型,该模型在孟加拉语 SA 任务中与现有的最新方法相比,具有出色的检测性能,准确率达 95.97%,F1 分数达 95.96%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信