Nenny Anggraini, S. Putra, Luh Kesuma Wardhani, Farid Dhiya Ul Arif, Nashrul Hakiem, I. Shofi
{"title":"随机森林、XGBoost 和 LightGBM 算法在 Reddit 评论中进行情感分类的比较分析","authors":"Nenny Anggraini, S. Putra, Luh Kesuma Wardhani, Farid Dhiya Ul Arif, Nashrul Hakiem, I. Shofi","doi":"10.15408/jti.v17i1.38651","DOIUrl":null,"url":null,"abstract":"This research aims to compare the performance of three classification algorithms, namely Random Forest, XGBoost, and LightGBM, in classifying emotions in Reddit comments. Emotion classification in Reddit comments is a complex classification problem due to its numerous variations and ambiguities. This research utilizes the GoEmotions Fine-Grained dataset, filtered down to 7,325 Reddit comments with 5 different basic emotion labels. In this study, data preprocessing steps, feature extraction using CountVectorizer and TF-IDF, and hyperparameter tuning using GridSearchCV for each algorithm are conducted. Subsequently, model evaluation is performed using Cross-Validation and confusion matrix. The results of the study indicate that Random Forest outperforms the XGBoost and LightGBM algorithm with an accuracy of 75.38% compared to XGBoost with 69.05% accuracy and LightGBM with 66.63% accuracy.","PeriodicalId":506287,"journal":{"name":"JURNAL TEKNIK INFORMATIKA","volume":"32 9","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Analysis of Random Forest, XGBoost, and LightGBM Algorithms for Emotion Classification in Reddit Comments\",\"authors\":\"Nenny Anggraini, S. Putra, Luh Kesuma Wardhani, Farid Dhiya Ul Arif, Nashrul Hakiem, I. Shofi\",\"doi\":\"10.15408/jti.v17i1.38651\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research aims to compare the performance of three classification algorithms, namely Random Forest, XGBoost, and LightGBM, in classifying emotions in Reddit comments. Emotion classification in Reddit comments is a complex classification problem due to its numerous variations and ambiguities. This research utilizes the GoEmotions Fine-Grained dataset, filtered down to 7,325 Reddit comments with 5 different basic emotion labels. In this study, data preprocessing steps, feature extraction using CountVectorizer and TF-IDF, and hyperparameter tuning using GridSearchCV for each algorithm are conducted. Subsequently, model evaluation is performed using Cross-Validation and confusion matrix. The results of the study indicate that Random Forest outperforms the XGBoost and LightGBM algorithm with an accuracy of 75.38% compared to XGBoost with 69.05% accuracy and LightGBM with 66.63% accuracy.\",\"PeriodicalId\":506287,\"journal\":{\"name\":\"JURNAL TEKNIK INFORMATIKA\",\"volume\":\"32 9\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JURNAL TEKNIK INFORMATIKA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15408/jti.v17i1.38651\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JURNAL TEKNIK INFORMATIKA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15408/jti.v17i1.38651","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Analysis of Random Forest, XGBoost, and LightGBM Algorithms for Emotion Classification in Reddit Comments
This research aims to compare the performance of three classification algorithms, namely Random Forest, XGBoost, and LightGBM, in classifying emotions in Reddit comments. Emotion classification in Reddit comments is a complex classification problem due to its numerous variations and ambiguities. This research utilizes the GoEmotions Fine-Grained dataset, filtered down to 7,325 Reddit comments with 5 different basic emotion labels. In this study, data preprocessing steps, feature extraction using CountVectorizer and TF-IDF, and hyperparameter tuning using GridSearchCV for each algorithm are conducted. Subsequently, model evaluation is performed using Cross-Validation and confusion matrix. The results of the study indicate that Random Forest outperforms the XGBoost and LightGBM algorithm with an accuracy of 75.38% compared to XGBoost with 69.05% accuracy and LightGBM with 66.63% accuracy.