{"title":"Code-Mixed Sentiment Analysis Using Machine Learning Approach – A Systematic Literature Review","authors":"C. Tho, H. Warnars, B. Soewito, F. Gaol","doi":"10.1109/ICICoS51170.2020.9299004","DOIUrl":null,"url":null,"abstract":"Code-mixed language is ubiquitous. Having been commonly practiced among bilingual communities, code-mixed language has emerged as a common language among social media users. Despite its popularity, the analysis of a code-mixed text is challenging as the text does not typically comply with the monolingual grammar. Therefore, the popularity of social media in the past ten years has raised wide attention to develop methods for analyzing code-mixed text such as extracting popularity sentiment from the text. Machine learning-based classifier such as Support Vector Machine, Naïve Bayes, Decision Tree, Logistic Regression have been widely used to analyze the sentiment. This paper intends to further explore machine learning classifiers, their performances, variables, and most common classifiers for the code-mixed sentiment analysis. Prisma Methodology was used in this paper, extracting 12 from 230 papers that met predefined required criteria, including publication year within the last 5 years. Our findings suggested that the most common classifiers found in the papers were Support Vector Machine, Naïve Bayes, and Logistic Regression. By using the accuracy and F1 as the performance measures, the Support Vector Machine exhibited a better performance compared to Naïve Bayes and Logistic Regression. Thus, this study supported the use of Support Vector Machine, Naïve Bayes and Logistic Regression as the main classifiers for the code-mixed sentiment analysis.","PeriodicalId":122803,"journal":{"name":"2020 4th International Conference on Informatics and Computational Sciences (ICICoS)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 4th International Conference on Informatics and Computational Sciences (ICICoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICoS51170.2020.9299004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Code-mixed language is ubiquitous. Having been commonly practiced among bilingual communities, code-mixed language has emerged as a common language among social media users. Despite its popularity, the analysis of a code-mixed text is challenging as the text does not typically comply with the monolingual grammar. Therefore, the popularity of social media in the past ten years has raised wide attention to develop methods for analyzing code-mixed text such as extracting popularity sentiment from the text. Machine learning-based classifier such as Support Vector Machine, Naïve Bayes, Decision Tree, Logistic Regression have been widely used to analyze the sentiment. This paper intends to further explore machine learning classifiers, their performances, variables, and most common classifiers for the code-mixed sentiment analysis. Prisma Methodology was used in this paper, extracting 12 from 230 papers that met predefined required criteria, including publication year within the last 5 years. Our findings suggested that the most common classifiers found in the papers were Support Vector Machine, Naïve Bayes, and Logistic Regression. By using the accuracy and F1 as the performance measures, the Support Vector Machine exhibited a better performance compared to Naïve Bayes and Logistic Regression. Thus, this study supported the use of Support Vector Machine, Naïve Bayes and Logistic Regression as the main classifiers for the code-mixed sentiment analysis.