Hualing Yi, Qingyu Xiong, Qinghong Zou, Rui Xu, Kai Wang, Min Gao
{"title":"A Novel Random Forest and its Application on Classification of Air Quality","authors":"Hualing Yi, Qingyu Xiong, Qinghong Zou, Rui Xu, Kai Wang, Min Gao","doi":"10.1109/IIAI-AAI.2019.00018","DOIUrl":null,"url":null,"abstract":"Air pollution has a serious impact on daily life. It is necessary to inform the air quality in time to the public in order to take measures in advance. Machine learning methods such as random forest are good at evaluating grades of air quality. We find the distribution of air data is imbalance, which leads to negative effect on random forest classifiers. We propose a random forest method based on samples grouped bootstrap to solve this problem. Then we design three sets of experiments to evaluate the performance of the proposed method. The results of experiments indicate that the proposed method presents an improvement of random forest when both apply on balance datasets. The improvement is very significant when they apply on imbalance datasets, where the new method is much better at classifying minority samples.","PeriodicalId":136474,"journal":{"name":"2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 8th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAI-AAI.2019.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
Air pollution has a serious impact on daily life. It is necessary to inform the air quality in time to the public in order to take measures in advance. Machine learning methods such as random forest are good at evaluating grades of air quality. We find the distribution of air data is imbalance, which leads to negative effect on random forest classifiers. We propose a random forest method based on samples grouped bootstrap to solve this problem. Then we design three sets of experiments to evaluate the performance of the proposed method. The results of experiments indicate that the proposed method presents an improvement of random forest when both apply on balance datasets. The improvement is very significant when they apply on imbalance datasets, where the new method is much better at classifying minority samples.