{"title":"不平衡数据的增量集成学习模型:以信用评分为例","authors":"My Thi Thien Bui","doi":"10.55579/jaec.202372.407","DOIUrl":null,"url":null,"abstract":"Imbalanced data is a challenge for classification models. It reduces the overall performance of traditional learning algorithms. Besides, the minority class of imbalanced datasets is misclassified with a high ratio even though this is a crucial object of the classification process. In this paper, a new model called the Lasso-Logistic ensemble is proposed to deal with imbalanced data by utilizing two popular techniques, random over-sampling and random under-sampling. The model was applied to two real imbalanced credit data sets. The results show that the Lasso-Logistic ensemble model offers better performance than the single traditional methods, such as random over-sampling, random under-sampling, Synthetic Minority Oversampling Technique (SMOTE), and cost-sensitive learning.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium provided the original work is properly cited.","PeriodicalId":250655,"journal":{"name":"J. Adv. Eng. Comput.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incremental Ensemble Learning Model for Imbalanced Data: a Case Study of Credit Scoring\",\"authors\":\"My Thi Thien Bui\",\"doi\":\"10.55579/jaec.202372.407\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Imbalanced data is a challenge for classification models. It reduces the overall performance of traditional learning algorithms. Besides, the minority class of imbalanced datasets is misclassified with a high ratio even though this is a crucial object of the classification process. In this paper, a new model called the Lasso-Logistic ensemble is proposed to deal with imbalanced data by utilizing two popular techniques, random over-sampling and random under-sampling. The model was applied to two real imbalanced credit data sets. The results show that the Lasso-Logistic ensemble model offers better performance than the single traditional methods, such as random over-sampling, random under-sampling, Synthetic Minority Oversampling Technique (SMOTE), and cost-sensitive learning.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium provided the original work is properly cited.\",\"PeriodicalId\":250655,\"journal\":{\"name\":\"J. Adv. Eng. Comput.\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Adv. Eng. Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.55579/jaec.202372.407\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Adv. Eng. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.55579/jaec.202372.407","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Incremental Ensemble Learning Model for Imbalanced Data: a Case Study of Credit Scoring
Imbalanced data is a challenge for classification models. It reduces the overall performance of traditional learning algorithms. Besides, the minority class of imbalanced datasets is misclassified with a high ratio even though this is a crucial object of the classification process. In this paper, a new model called the Lasso-Logistic ensemble is proposed to deal with imbalanced data by utilizing two popular techniques, random over-sampling and random under-sampling. The model was applied to two real imbalanced credit data sets. The results show that the Lasso-Logistic ensemble model offers better performance than the single traditional methods, such as random over-sampling, random under-sampling, Synthetic Minority Oversampling Technique (SMOTE), and cost-sensitive learning.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium provided the original work is properly cited.