{"title":"用SMOTE、Tomek和SMOTE-Tomek观察不平衡数据文本预测女性日报上销售产品的用户","authors":"Bern Jonathan, P. Putra, Y. Ruldeviyani","doi":"10.1109/IAICT50021.2020.9172033","DOIUrl":null,"url":null,"abstract":"Female Daily is a beauty platform that has social media application share users’ experiences of beauty by posting images and text in a post. Female Daily has terms of condition to not use the platform for selling in their post. Somehow, users of Female Daily sometimes use the platform for selling beauty products. Post of users in Female Daily records in Female Daily databases. In that data, there are imbalanced data about users’ posts that banned (minority class) and post that admin does not ban because it does not contain selling products (majority class). SMOTE and Tomek are techniques for handling imbalanced data by over-sampling and under-sampling techniques repeatedly to manage the data into balance. In this study, we want to evaluate the imbalanced data text in Female Daily using SMOTE, Tomek, and SMOTE-Tomek. Predicting algorithms that we will use are Support Vector Machine (SVM) and Logistic Regression (LR) using transform vector TF-IDF to evaluate the best methods to predict the users selling products on Female Daily. The results of this study show us the effect of SMOTE, Tomek, and SMOTE-Tomek to Precision-Recall in people selling products (majority class) is effects not quite high and also reducing the Precision-Recall, but for people selling products (minority class) is positives improvement. The highest results combination each metrics are; G-Mean combination SMOTE-Tomek with SVM, Precision to minority class combination of SMOTE with SVM, Recall to minority class combination of SMOTE with LR. Experimental results on this study indicate the usefulness of the using SMOTE or SMOTE-Tomek approach.","PeriodicalId":433718,"journal":{"name":"2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","volume":"4 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Observation Imbalanced Data Text to Predict Users Selling Products on Female Daily with SMOTE, Tomek, and SMOTE-Tomek\",\"authors\":\"Bern Jonathan, P. Putra, Y. Ruldeviyani\",\"doi\":\"10.1109/IAICT50021.2020.9172033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Female Daily is a beauty platform that has social media application share users’ experiences of beauty by posting images and text in a post. Female Daily has terms of condition to not use the platform for selling in their post. Somehow, users of Female Daily sometimes use the platform for selling beauty products. Post of users in Female Daily records in Female Daily databases. In that data, there are imbalanced data about users’ posts that banned (minority class) and post that admin does not ban because it does not contain selling products (majority class). SMOTE and Tomek are techniques for handling imbalanced data by over-sampling and under-sampling techniques repeatedly to manage the data into balance. In this study, we want to evaluate the imbalanced data text in Female Daily using SMOTE, Tomek, and SMOTE-Tomek. Predicting algorithms that we will use are Support Vector Machine (SVM) and Logistic Regression (LR) using transform vector TF-IDF to evaluate the best methods to predict the users selling products on Female Daily. The results of this study show us the effect of SMOTE, Tomek, and SMOTE-Tomek to Precision-Recall in people selling products (majority class) is effects not quite high and also reducing the Precision-Recall, but for people selling products (minority class) is positives improvement. The highest results combination each metrics are; G-Mean combination SMOTE-Tomek with SVM, Precision to minority class combination of SMOTE with SVM, Recall to minority class combination of SMOTE with LR. Experimental results on this study indicate the usefulness of the using SMOTE or SMOTE-Tomek approach.\",\"PeriodicalId\":433718,\"journal\":{\"name\":\"2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"volume\":\"4 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IAICT50021.2020.9172033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAICT50021.2020.9172033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Observation Imbalanced Data Text to Predict Users Selling Products on Female Daily with SMOTE, Tomek, and SMOTE-Tomek
Female Daily is a beauty platform that has social media application share users’ experiences of beauty by posting images and text in a post. Female Daily has terms of condition to not use the platform for selling in their post. Somehow, users of Female Daily sometimes use the platform for selling beauty products. Post of users in Female Daily records in Female Daily databases. In that data, there are imbalanced data about users’ posts that banned (minority class) and post that admin does not ban because it does not contain selling products (majority class). SMOTE and Tomek are techniques for handling imbalanced data by over-sampling and under-sampling techniques repeatedly to manage the data into balance. In this study, we want to evaluate the imbalanced data text in Female Daily using SMOTE, Tomek, and SMOTE-Tomek. Predicting algorithms that we will use are Support Vector Machine (SVM) and Logistic Regression (LR) using transform vector TF-IDF to evaluate the best methods to predict the users selling products on Female Daily. The results of this study show us the effect of SMOTE, Tomek, and SMOTE-Tomek to Precision-Recall in people selling products (majority class) is effects not quite high and also reducing the Precision-Recall, but for people selling products (minority class) is positives improvement. The highest results combination each metrics are; G-Mean combination SMOTE-Tomek with SVM, Precision to minority class combination of SMOTE with SVM, Recall to minority class combination of SMOTE with LR. Experimental results on this study indicate the usefulness of the using SMOTE or SMOTE-Tomek approach.