M. P. Paing, C. Pintavirooj, S. Tungjitkusolmun, S. Choomchuay, K. Hamamoto
{"title":"随机森林中不平衡数据分类抽样方法的比较","authors":"M. P. Paing, C. Pintavirooj, S. Tungjitkusolmun, S. Choomchuay, K. Hamamoto","doi":"10.1109/BMEICON.2018.8609946","DOIUrl":null,"url":null,"abstract":"Imbalanced data classification is a serious and challenging task for most of the medical image diagnosis applications. They usually produce a larger number of false samples compared to the actual ones. That is the number of samples for the class of interest (minority) is significantly fewer than other types of class (majority). The classification performed using such data is called imbalanced data classification. As a consequence, the learning model bias towards the majority class and fails the classification of the minority class. Data sampling and ensemble methods are common ways to compensate for this issue. Random forest (RF), an ensemble of multiple decision trees, is very famous in both of the classification and regression problems because of its robust and accurate predictions. However, it also suffers class bias in the imbalanced data classification problems. This paper proposes and compares different sampling methods to solve the imbalanced data classification in RF.","PeriodicalId":232271,"journal":{"name":"2018 11th Biomedical Engineering International Conference (BMEiCON)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Comparison of Sampling Methods for Imbalanced Data Classification in Random Forest\",\"authors\":\"M. P. Paing, C. Pintavirooj, S. Tungjitkusolmun, S. Choomchuay, K. Hamamoto\",\"doi\":\"10.1109/BMEICON.2018.8609946\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imbalanced data classification is a serious and challenging task for most of the medical image diagnosis applications. They usually produce a larger number of false samples compared to the actual ones. That is the number of samples for the class of interest (minority) is significantly fewer than other types of class (majority). The classification performed using such data is called imbalanced data classification. As a consequence, the learning model bias towards the majority class and fails the classification of the minority class. Data sampling and ensemble methods are common ways to compensate for this issue. Random forest (RF), an ensemble of multiple decision trees, is very famous in both of the classification and regression problems because of its robust and accurate predictions. However, it also suffers class bias in the imbalanced data classification problems. This paper proposes and compares different sampling methods to solve the imbalanced data classification in RF.\",\"PeriodicalId\":232271,\"journal\":{\"name\":\"2018 11th Biomedical Engineering International Conference (BMEiCON)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 11th Biomedical Engineering International Conference (BMEiCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BMEICON.2018.8609946\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 11th Biomedical Engineering International Conference (BMEiCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BMEICON.2018.8609946","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparison of Sampling Methods for Imbalanced Data Classification in Random Forest
Imbalanced data classification is a serious and challenging task for most of the medical image diagnosis applications. They usually produce a larger number of false samples compared to the actual ones. That is the number of samples for the class of interest (minority) is significantly fewer than other types of class (majority). The classification performed using such data is called imbalanced data classification. As a consequence, the learning model bias towards the majority class and fails the classification of the minority class. Data sampling and ensemble methods are common ways to compensate for this issue. Random forest (RF), an ensemble of multiple decision trees, is very famous in both of the classification and regression problems because of its robust and accurate predictions. However, it also suffers class bias in the imbalanced data classification problems. This paper proposes and compares different sampling methods to solve the imbalanced data classification in RF.