{"title":"推文情感分析中监督机器学习模型的数据预处理和参数调整的重要性","authors":"Saurab Adhikari","doi":"10.3126/batuk.v10i1.62303","DOIUrl":null,"url":null,"abstract":"This paper shows the comparison of five different supervised machine learning models by showing the accuracy and classification report of these models when used for tweets sentiments analysis while showing the improvement in accuracy when data was preprocessed and parameters were tuned. The five different models that were used are: NaiveBayes, Support Vector Machine, Random Forest, Long Short-Term Memory (LSTM) and XG Boost. Total of 25000 tweets were processed, analyzed and predicted the output as positive, negative, or neutral using those models. This research would help to understand which models should be used and followed and which model would yield higher accuracy while using various approaches of data preprocessing and parameters tuning. The paper also tries to show that the standard models can still perform better and are still viable for sentiment analysis while SVM and Random Forest classifiers maybe viewed as standard learning strategies.","PeriodicalId":185827,"journal":{"name":"The Batuk","volume":"47 24","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Importance of Data Preprocessing and Parameters Tuning for Supervised Machine Learning Models on Tweets Sentiment Analysis\",\"authors\":\"Saurab Adhikari\",\"doi\":\"10.3126/batuk.v10i1.62303\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper shows the comparison of five different supervised machine learning models by showing the accuracy and classification report of these models when used for tweets sentiments analysis while showing the improvement in accuracy when data was preprocessed and parameters were tuned. The five different models that were used are: NaiveBayes, Support Vector Machine, Random Forest, Long Short-Term Memory (LSTM) and XG Boost. Total of 25000 tweets were processed, analyzed and predicted the output as positive, negative, or neutral using those models. This research would help to understand which models should be used and followed and which model would yield higher accuracy while using various approaches of data preprocessing and parameters tuning. The paper also tries to show that the standard models can still perform better and are still viable for sentiment analysis while SVM and Random Forest classifiers maybe viewed as standard learning strategies.\",\"PeriodicalId\":185827,\"journal\":{\"name\":\"The Batuk\",\"volume\":\"47 24\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Batuk\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3126/batuk.v10i1.62303\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Batuk","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3126/batuk.v10i1.62303","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Importance of Data Preprocessing and Parameters Tuning for Supervised Machine Learning Models on Tweets Sentiment Analysis
This paper shows the comparison of five different supervised machine learning models by showing the accuracy and classification report of these models when used for tweets sentiments analysis while showing the improvement in accuracy when data was preprocessed and parameters were tuned. The five different models that were used are: NaiveBayes, Support Vector Machine, Random Forest, Long Short-Term Memory (LSTM) and XG Boost. Total of 25000 tweets were processed, analyzed and predicted the output as positive, negative, or neutral using those models. This research would help to understand which models should be used and followed and which model would yield higher accuracy while using various approaches of data preprocessing and parameters tuning. The paper also tries to show that the standard models can still perform better and are still viable for sentiment analysis while SVM and Random Forest classifiers maybe viewed as standard learning strategies.