Khaldoon Alshouiliy, Sujan Ray, A. Al-Ghamdi, D. Agrawal
{"title":"利用(基于K-NN的SMOTE_3D算法)增强不平衡数据集","authors":"Khaldoon Alshouiliy, Sujan Ray, A. Al-Ghamdi, D. Agrawal","doi":"10.17352/ARA.000002","DOIUrl":null,"url":null,"abstract":"Big data is currently a huge industry that has grown significantly every year. Big data is being used by machine learning and deep learning algorithm to study, analyze and parse big data and then drive useful and beneficial results. However, most of the real datasets are collected through different organizations and social media and mainly fall under the category of Big Data applications. One of the biggest and most drawbacks of such datasets is an imbalance representation of samples from different categories. In such case, the classifiers and deep learning techniques are not capable of handling issues like these. A majority of existing works tend to overlook these issues. Typical data balancing methods in the literature resort to data resampling whether it is under sampling a majority class samples or oversampling the minority class of samples. In this work, we focus on the minority sample and ignore the majority ones. Many researchers have done many works as most of the work suffers from over sampling or form the generated noise in the dataset. Additionally, works are either suitable for either big data or small data. Moreover, some other work suffers from a long processing time as complicated algorithms are used with many steps to fix the imbalance problem. Therefore, we introduce a new algorithm that deals with all these issues. We have created a short example to explain briefly how the SMOTE works and why we need to enhance the SMOTE and we have done this by using a very well-known imbalance dataset that we downloaded from the Kaggle website. We collect the results by using Azure machine learning platform. Then, we compare the results to see that the model is functional just good with SMOTE and way better than without it.","PeriodicalId":73286,"journal":{"name":"IEEE International Conference on Robotics and Automation : ICRA : [proceedings]. IEEE International Conference on Robotics and Automation","volume":"4 1","pages":"001-006"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Enhancing Imbalanced Dataset by Utilizing (K-NN Based SMOTE_3D Algorithm)\",\"authors\":\"Khaldoon Alshouiliy, Sujan Ray, A. Al-Ghamdi, D. Agrawal\",\"doi\":\"10.17352/ARA.000002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big data is currently a huge industry that has grown significantly every year. Big data is being used by machine learning and deep learning algorithm to study, analyze and parse big data and then drive useful and beneficial results. However, most of the real datasets are collected through different organizations and social media and mainly fall under the category of Big Data applications. One of the biggest and most drawbacks of such datasets is an imbalance representation of samples from different categories. In such case, the classifiers and deep learning techniques are not capable of handling issues like these. A majority of existing works tend to overlook these issues. Typical data balancing methods in the literature resort to data resampling whether it is under sampling a majority class samples or oversampling the minority class of samples. In this work, we focus on the minority sample and ignore the majority ones. Many researchers have done many works as most of the work suffers from over sampling or form the generated noise in the dataset. Additionally, works are either suitable for either big data or small data. Moreover, some other work suffers from a long processing time as complicated algorithms are used with many steps to fix the imbalance problem. Therefore, we introduce a new algorithm that deals with all these issues. We have created a short example to explain briefly how the SMOTE works and why we need to enhance the SMOTE and we have done this by using a very well-known imbalance dataset that we downloaded from the Kaggle website. We collect the results by using Azure machine learning platform. Then, we compare the results to see that the model is functional just good with SMOTE and way better than without it.\",\"PeriodicalId\":73286,\"journal\":{\"name\":\"IEEE International Conference on Robotics and Automation : ICRA : [proceedings]. IEEE International Conference on Robotics and Automation\",\"volume\":\"4 1\",\"pages\":\"001-006\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on Robotics and Automation : ICRA : [proceedings]. IEEE International Conference on Robotics and Automation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17352/ARA.000002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Robotics and Automation : ICRA : [proceedings]. IEEE International Conference on Robotics and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17352/ARA.000002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhancing Imbalanced Dataset by Utilizing (K-NN Based SMOTE_3D Algorithm)
Big data is currently a huge industry that has grown significantly every year. Big data is being used by machine learning and deep learning algorithm to study, analyze and parse big data and then drive useful and beneficial results. However, most of the real datasets are collected through different organizations and social media and mainly fall under the category of Big Data applications. One of the biggest and most drawbacks of such datasets is an imbalance representation of samples from different categories. In such case, the classifiers and deep learning techniques are not capable of handling issues like these. A majority of existing works tend to overlook these issues. Typical data balancing methods in the literature resort to data resampling whether it is under sampling a majority class samples or oversampling the minority class of samples. In this work, we focus on the minority sample and ignore the majority ones. Many researchers have done many works as most of the work suffers from over sampling or form the generated noise in the dataset. Additionally, works are either suitable for either big data or small data. Moreover, some other work suffers from a long processing time as complicated algorithms are used with many steps to fix the imbalance problem. Therefore, we introduce a new algorithm that deals with all these issues. We have created a short example to explain briefly how the SMOTE works and why we need to enhance the SMOTE and we have done this by using a very well-known imbalance dataset that we downloaded from the Kaggle website. We collect the results by using Azure machine learning platform. Then, we compare the results to see that the model is functional just good with SMOTE and way better than without it.