{"title":"DL-TBAM: Deep Learning Transformer based Architecture Model for Sentiment Analysis on Tamil-English Dataset","authors":"M. Sangeetha, K. Nimala","doi":"10.3233/jifs-236971","DOIUrl":null,"url":null,"abstract":"NLP, or natural language processing, is a subfield of AI that aims to equip computers with the ability to understand and analyze human language. Sentiment analysis is a widely used application of NLP, particularly for examining attitudes expressed in online conversations. Nevertheless, many social media comments are written in languages that are not native to the authors, making sentiment analysis more difficult, especially for languages with limited resources, such as Tamil. To tackle this issue, a code-mixed and sentiment-annotated corpus in Tamil and English was created. This article will explain how the corpus was established, including the process of data collection and the assignment of polarities. The article will also explore the agreement between annotators and the results of sentiment analysis performed on the corpus. This work signifies various performance metrics such as precision, recall, support, and F1-score for the transformer-based model such as BERT, RoBerta, and XLM-RoBerta. Among the various models, XLM-Robert shows slightly significant positive results on the code-mixed corpus when compared to the state of art models.","PeriodicalId":509313,"journal":{"name":"Journal of Intelligent & Fuzzy Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent & Fuzzy Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/jifs-236971","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
NLP, or natural language processing, is a subfield of AI that aims to equip computers with the ability to understand and analyze human language. Sentiment analysis is a widely used application of NLP, particularly for examining attitudes expressed in online conversations. Nevertheless, many social media comments are written in languages that are not native to the authors, making sentiment analysis more difficult, especially for languages with limited resources, such as Tamil. To tackle this issue, a code-mixed and sentiment-annotated corpus in Tamil and English was created. This article will explain how the corpus was established, including the process of data collection and the assignment of polarities. The article will also explore the agreement between annotators and the results of sentiment analysis performed on the corpus. This work signifies various performance metrics such as precision, recall, support, and F1-score for the transformer-based model such as BERT, RoBerta, and XLM-RoBerta. Among the various models, XLM-Robert shows slightly significant positive results on the code-mixed corpus when compared to the state of art models.