Phakhawat Sarakit, T. Theeramunkong, C. Haruechaiyasak
{"title":"利用SMOTE算法改进YouTube不平衡数据集的情绪分类","authors":"Phakhawat Sarakit, T. Theeramunkong, C. Haruechaiyasak","doi":"10.1109/ICAICTA.2015.7335373","DOIUrl":null,"url":null,"abstract":"The imbalanced dataset problem triggers degradation of classification performance in several data mining applications including pattern recognition, text categorization, and information filtering tasks. To improve emotion classification performance, we use a sampling-based algorithm called SMOTE, which oversamples instances in a minority class to the number of those from the majority class. YouTube dataset was balanced using the SMOTE technique and tested using three machine learning algorithms, namely multinomial Naïve Bayes (MNB), decision tree (DT) and support vector machines (SVM). As a result, SVM achieves the highest accuracy with 93.30% on filtering task and 89.44% on classification. The SMOTE technique can solve the imbalanced data problem and obtain an improved classification result.","PeriodicalId":319020,"journal":{"name":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Improving emotion classification in imbalanced YouTube dataset using SMOTE algorithm\",\"authors\":\"Phakhawat Sarakit, T. Theeramunkong, C. Haruechaiyasak\",\"doi\":\"10.1109/ICAICTA.2015.7335373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The imbalanced dataset problem triggers degradation of classification performance in several data mining applications including pattern recognition, text categorization, and information filtering tasks. To improve emotion classification performance, we use a sampling-based algorithm called SMOTE, which oversamples instances in a minority class to the number of those from the majority class. YouTube dataset was balanced using the SMOTE technique and tested using three machine learning algorithms, namely multinomial Naïve Bayes (MNB), decision tree (DT) and support vector machines (SVM). As a result, SVM achieves the highest accuracy with 93.30% on filtering task and 89.44% on classification. The SMOTE technique can solve the imbalanced data problem and obtain an improved classification result.\",\"PeriodicalId\":319020,\"journal\":{\"name\":\"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAICTA.2015.7335373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAICTA.2015.7335373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving emotion classification in imbalanced YouTube dataset using SMOTE algorithm
The imbalanced dataset problem triggers degradation of classification performance in several data mining applications including pattern recognition, text categorization, and information filtering tasks. To improve emotion classification performance, we use a sampling-based algorithm called SMOTE, which oversamples instances in a minority class to the number of those from the majority class. YouTube dataset was balanced using the SMOTE technique and tested using three machine learning algorithms, namely multinomial Naïve Bayes (MNB), decision tree (DT) and support vector machines (SVM). As a result, SVM achieves the highest accuracy with 93.30% on filtering task and 89.44% on classification. The SMOTE technique can solve the imbalanced data problem and obtain an improved classification result.