{"title":"探索资源不足的孟加拉语情感分析任务中的转换器模型","authors":"Md. Nesarul Hoque , Umme Salma , Md. Jamal Uddin , Md. Martuza Ahamad , Sakifa Aktar","doi":"10.1016/j.nlp.2024.100091","DOIUrl":null,"url":null,"abstract":"<div><p>In the sentiment analysis (SA) task, we can obtain a positive or negative-typed comment or feedback from an online user or a customer about any object, such as a movie, drama, food, and others. This user’s sentiment may positively impact various decision-making processes. In this regard, a lot of studies have been done on identifying sentiments from a text in high-resource languages like English. However, a small number of studies are detected in the under-resource Bengali language because of the unavailability of the benchmark corpus, limitations of text processing application software, and so on. Furthermore, there is still enough space to enhance the classification performance of the SA task. In this research, we experiment on a recognized Bengali dataset of 11,807 comments to find positive or negative sentiments. We employ five state-of-the-art transformer-based pretrained models, such as multilingual Bidirectional Encoder Representations from Transformers (mBERT), BanglaBERT, Bangla-Bert-Base, DistilmBERT, and XLM-RoBERTa-base (XLM-R-base), with tuning of the hyperparameters. After that, we propose a combined model named Transformer-ensemble that presents outstanding detection performance with an accuracy of 95.97% and an F1-score of 95.96% compared to the existing recent methods in the Bengali SA task.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100091"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000396/pdfft?md5=224e6ebbfc8811318218e54f481e4c76&pid=1-s2.0-S2949719124000396-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Exploring transformer models in the sentiment analysis task for the under-resource Bengali language\",\"authors\":\"Md. Nesarul Hoque , Umme Salma , Md. Jamal Uddin , Md. Martuza Ahamad , Sakifa Aktar\",\"doi\":\"10.1016/j.nlp.2024.100091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In the sentiment analysis (SA) task, we can obtain a positive or negative-typed comment or feedback from an online user or a customer about any object, such as a movie, drama, food, and others. This user’s sentiment may positively impact various decision-making processes. In this regard, a lot of studies have been done on identifying sentiments from a text in high-resource languages like English. However, a small number of studies are detected in the under-resource Bengali language because of the unavailability of the benchmark corpus, limitations of text processing application software, and so on. Furthermore, there is still enough space to enhance the classification performance of the SA task. In this research, we experiment on a recognized Bengali dataset of 11,807 comments to find positive or negative sentiments. We employ five state-of-the-art transformer-based pretrained models, such as multilingual Bidirectional Encoder Representations from Transformers (mBERT), BanglaBERT, Bangla-Bert-Base, DistilmBERT, and XLM-RoBERTa-base (XLM-R-base), with tuning of the hyperparameters. After that, we propose a combined model named Transformer-ensemble that presents outstanding detection performance with an accuracy of 95.97% and an F1-score of 95.96% compared to the existing recent methods in the Bengali SA task.</p></div>\",\"PeriodicalId\":100944,\"journal\":{\"name\":\"Natural Language Processing Journal\",\"volume\":\"8 \",\"pages\":\"Article 100091\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949719124000396/pdfft?md5=224e6ebbfc8811318218e54f481e4c76&pid=1-s2.0-S2949719124000396-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Language Processing Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949719124000396\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719124000396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在情感分析(SA)任务中,我们可以从在线用户或客户那里获得对任何对象(如电影、电视剧、食品等)的正面或负面评论或反馈。这种用户情感可能会对各种决策过程产生积极影响。在这方面,已经有很多关于从英语等高资源语言文本中识别情感的研究。然而,由于缺乏基准语料库、文本处理应用软件的局限性等原因,对资源不足的孟加拉语进行的研究为数不多。此外,提高 SA 任务的分类性能仍有足够的空间。在本研究中,我们在一个包含 11807 条评论的公认孟加拉语数据集上进行了实验,以发现积极或消极情绪。我们采用了五种最先进的基于变换器的预训练模型,如多语种变换器双向编码器表示(mBERT)、BanglaBERT、Bangla-Bert-Base、DistilmBERT 和 XLM-RoBERTa-base (XLM-R-base),并对超参数进行了调整。之后,我们提出了一个名为 Transformer-ensemble 的组合模型,该模型在孟加拉语 SA 任务中与现有的最新方法相比,具有出色的检测性能,准确率达 95.97%,F1 分数达 95.96%。
Exploring transformer models in the sentiment analysis task for the under-resource Bengali language
In the sentiment analysis (SA) task, we can obtain a positive or negative-typed comment or feedback from an online user or a customer about any object, such as a movie, drama, food, and others. This user’s sentiment may positively impact various decision-making processes. In this regard, a lot of studies have been done on identifying sentiments from a text in high-resource languages like English. However, a small number of studies are detected in the under-resource Bengali language because of the unavailability of the benchmark corpus, limitations of text processing application software, and so on. Furthermore, there is still enough space to enhance the classification performance of the SA task. In this research, we experiment on a recognized Bengali dataset of 11,807 comments to find positive or negative sentiments. We employ five state-of-the-art transformer-based pretrained models, such as multilingual Bidirectional Encoder Representations from Transformers (mBERT), BanglaBERT, Bangla-Bert-Base, DistilmBERT, and XLM-RoBERTa-base (XLM-R-base), with tuning of the hyperparameters. After that, we propose a combined model named Transformer-ensemble that presents outstanding detection performance with an accuracy of 95.97% and an F1-score of 95.96% compared to the existing recent methods in the Bengali SA task.