{"title":"Research of Imbalanced Classification Based on Cascade Forest","authors":"M. Shi, Fangxin Lin, Ying Qian, Liang Dou","doi":"10.1109/PIC53636.2021.9687091","DOIUrl":null,"url":null,"abstract":"With the rapid development of science, the quantity of data is increasing exponentially. And unprecedented opportunities are provided by machine learning and data mining. While data classification is commonly used as a primary data processing method, the diversity of data is also a great challenge. Among those, problems caused by class imbalance are attracting more attention, and there are also a number of strategies and improvement of original algorithms are proposed. Gcforest is a new integrated learning algorithm proposed by Professor Zhou Zhihua in 2017. It has the advantages of few super parameters, suitable for small-scale data sets and strong model expression ability. However, the algorithm does not optimize the unbalanced data classification. Inspired by the improvement of other ensemble learning algorithms for unbalanced data classification, this paper applies a variety of under sampling strategies to the cascaded forest of gcforest. Through experimental comparison, it has achieved better or similar performance than the current advanced learning algorithms for unbalanced data sets on a variety of typical unbalanced data sets.","PeriodicalId":297239,"journal":{"name":"2021 IEEE International Conference on Progress in Informatics and Computing (PIC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Progress in Informatics and Computing (PIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PIC53636.2021.9687091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid development of science, the quantity of data is increasing exponentially. And unprecedented opportunities are provided by machine learning and data mining. While data classification is commonly used as a primary data processing method, the diversity of data is also a great challenge. Among those, problems caused by class imbalance are attracting more attention, and there are also a number of strategies and improvement of original algorithms are proposed. Gcforest is a new integrated learning algorithm proposed by Professor Zhou Zhihua in 2017. It has the advantages of few super parameters, suitable for small-scale data sets and strong model expression ability. However, the algorithm does not optimize the unbalanced data classification. Inspired by the improvement of other ensemble learning algorithms for unbalanced data classification, this paper applies a variety of under sampling strategies to the cascaded forest of gcforest. Through experimental comparison, it has achieved better or similar performance than the current advanced learning algorithms for unbalanced data sets on a variety of typical unbalanced data sets.