Code-Mixed Sentiment Analysis Using Machine Learning Approach – A Systematic Literature Review

2020 4th International Conference on Informatics and Computational Sciences (ICICoS) Pub Date : 2020-11-10 DOI:10.1109/ICICoS51170.2020.9299004

C. Tho, H. Warnars, B. Soewito, F. Gaol

{"title":"Code-Mixed Sentiment Analysis Using Machine Learning Approach – A Systematic Literature Review","authors":"C. Tho, H. Warnars, B. Soewito, F. Gaol","doi":"10.1109/ICICoS51170.2020.9299004","DOIUrl":null,"url":null,"abstract":"Code-mixed language is ubiquitous. Having been commonly practiced among bilingual communities, code-mixed language has emerged as a common language among social media users. Despite its popularity, the analysis of a code-mixed text is challenging as the text does not typically comply with the monolingual grammar. Therefore, the popularity of social media in the past ten years has raised wide attention to develop methods for analyzing code-mixed text such as extracting popularity sentiment from the text. Machine learning-based classifier such as Support Vector Machine, Naïve Bayes, Decision Tree, Logistic Regression have been widely used to analyze the sentiment. This paper intends to further explore machine learning classifiers, their performances, variables, and most common classifiers for the code-mixed sentiment analysis. Prisma Methodology was used in this paper, extracting 12 from 230 papers that met predefined required criteria, including publication year within the last 5 years. Our findings suggested that the most common classifiers found in the papers were Support Vector Machine, Naïve Bayes, and Logistic Regression. By using the accuracy and F1 as the performance measures, the Support Vector Machine exhibited a better performance compared to Naïve Bayes and Logistic Regression. Thus, this study supported the use of Support Vector Machine, Naïve Bayes and Logistic Regression as the main classifiers for the code-mixed sentiment analysis.","PeriodicalId":122803,"journal":{"name":"2020 4th International Conference on Informatics and Computational Sciences (ICICoS)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Informatics and Computational Sciences (ICICoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICoS51170.2020.9299004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Code-mixed language is ubiquitous. Having been commonly practiced among bilingual communities, code-mixed language has emerged as a common language among social media users. Despite its popularity, the analysis of a code-mixed text is challenging as the text does not typically comply with the monolingual grammar. Therefore, the popularity of social media in the past ten years has raised wide attention to develop methods for analyzing code-mixed text such as extracting popularity sentiment from the text. Machine learning-based classifier such as Support Vector Machine, Naïve Bayes, Decision Tree, Logistic Regression have been widely used to analyze the sentiment. This paper intends to further explore machine learning classifiers, their performances, variables, and most common classifiers for the code-mixed sentiment analysis. Prisma Methodology was used in this paper, extracting 12 from 230 papers that met predefined required criteria, including publication year within the last 5 years. Our findings suggested that the most common classifiers found in the papers were Support Vector Machine, Naïve Bayes, and Logistic Regression. By using the accuracy and F1 as the performance measures, the Support Vector Machine exhibited a better performance compared to Naïve Bayes and Logistic Regression. Thus, this study supported the use of Support Vector Machine, Naïve Bayes and Logistic Regression as the main classifiers for the code-mixed sentiment analysis.

查看原文本刊更多论文

使用机器学习方法的代码混合情感分析-系统文献综述

代码混合语言无处不在。在双语社区中普遍使用的代码混合语言已经成为社交媒体用户的通用语言。尽管代码混合文本很受欢迎，但由于文本通常不符合单语语法，因此对代码混合文本的分析是具有挑战性的。因此，近十年来社交媒体的普及引起了人们对开发从文本中提取流行情绪等分析码混文本的方法的广泛关注。基于机器学习的分类器如支持向量机、Naïve贝叶斯、决策树、逻辑回归等已被广泛用于情感分析。本文打算进一步探讨机器学习分类器、它们的性能、变量和用于代码混合情感分析的最常见分类器。本文采用Prisma方法学，从230篇论文中提取出12篇符合预定标准的论文，包括最近5年内的发表年份。我们的研究结果表明，论文中最常见的分类器是支持向量机，Naïve贝叶斯和逻辑回归。通过使用精度和F1作为性能度量，与Naïve贝叶斯和逻辑回归相比，支持向量机表现出更好的性能。因此，本研究支持使用支持向量机，Naïve贝叶斯和逻辑回归作为代码混合情感分析的主要分类器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 4th International Conference on Informatics and Computational Sciences (ICICoS)

自引率

0.00%

发文量