Gauthami Sreenivas, Kishan Minna Murthy, Kshitij Prit Gopali, Navya Eedula, Mamatha H R
{"title":"Sentiment Analysis of Hotel Reviews - a Comparative Study","authors":"Gauthami Sreenivas, Kishan Minna Murthy, Kshitij Prit Gopali, Navya Eedula, Mamatha H R","doi":"10.1109/I2CT57861.2023.10126445","DOIUrl":null,"url":null,"abstract":"Sentiment analysis is an important domain in Natural Language Processing (NLP) since it is an efficient way to extract features and user sentiments from textual data. Performing sentiment analysis of big data in the tourism industry is useful for businesses to understand the needs of their customers and improve hotel facilities to increase customer satisfaction. This paper aims to compare, analyze and employ different types of supervised, unsupervised, and pre-trained models. The supervised models - Decision Trees, XGBoost, Multinomial Naïve Bayes, Multinomial Logistic Regression, SVM, and Stochastic Gradient Descent were tested and the parameters were optimised using GridSearchCV. Two unsupervised models, K-means clustering and Latent Dirichlet Allocation were implemented with TF-IDF and Word2Vec embeddings. The pre-trained models, VADER and TextBlob were also implemented. The labelled dataset used for this study contains user reviews of hotels around the world, where each review is classified as positive, neutral, or negative. The SVM model resulted in the highest weighted F1 score of 0.8516.","PeriodicalId":150346,"journal":{"name":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 8th International Conference for Convergence in Technology (I2CT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/I2CT57861.2023.10126445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Sentiment analysis is an important domain in Natural Language Processing (NLP) since it is an efficient way to extract features and user sentiments from textual data. Performing sentiment analysis of big data in the tourism industry is useful for businesses to understand the needs of their customers and improve hotel facilities to increase customer satisfaction. This paper aims to compare, analyze and employ different types of supervised, unsupervised, and pre-trained models. The supervised models - Decision Trees, XGBoost, Multinomial Naïve Bayes, Multinomial Logistic Regression, SVM, and Stochastic Gradient Descent were tested and the parameters were optimised using GridSearchCV. Two unsupervised models, K-means clustering and Latent Dirichlet Allocation were implemented with TF-IDF and Word2Vec embeddings. The pre-trained models, VADER and TextBlob were also implemented. The labelled dataset used for this study contains user reviews of hotels around the world, where each review is classified as positive, neutral, or negative. The SVM model resulted in the highest weighted F1 score of 0.8516.