{"title":"基于不同采样率的加权集合,用于不平衡分类并应用于信用风险评估","authors":"Xialin Wang, Yanying Li, Jiaoni Zhang","doi":"10.1016/j.eswa.2024.125595","DOIUrl":null,"url":null,"abstract":"<div><div>Imbalanced data classification is an important research topic in machine learning. The class imbalance problem has a great impact on the classification performance of the algorithm. In this research direction, proposing an effective sampling strategy for imbalanced data is a challenging task. Although a lot of methods have been proposed to classify imbalanced data, the problem remains open. If a method reflects the data distribution and removes noisy samples, then good classification results will be obtained. Therefore, this paper proposes a weighted ensemble algorithm based on differentiated sampling rates (KSDE) and apply it to the field of credit risk assessment. KSDE removes noisy samples using the outlier detection technique. Then, multiple balanced training subsets are generated to train submodels using differentiated sampling rates. These training subsets sufficiently represent the distribution of data. Finally, the well-performing submodels are weighted and integrated to obtain the prediction result. We conducted comprehensive experiments to validate the performance of the proposed method. Comparing 12 state-of-the-art methods on 23 datasets. KSDE outperforms the recently proposed SPE (Self-paced Ensemble) by 12.46% in terms of TPR (True Positive Rate). In addition, KSDE achieves good results on 7 credit risk datasets. The experimental results show that the proposed method is competitive in solving the imbalanced data classification problem.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"262 ","pages":"Article 125595"},"PeriodicalIF":7.5000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Weighted ensemble based on differentiated sampling rates for imbalanced classification and application to credit risk assessment\",\"authors\":\"Xialin Wang, Yanying Li, Jiaoni Zhang\",\"doi\":\"10.1016/j.eswa.2024.125595\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Imbalanced data classification is an important research topic in machine learning. The class imbalance problem has a great impact on the classification performance of the algorithm. In this research direction, proposing an effective sampling strategy for imbalanced data is a challenging task. Although a lot of methods have been proposed to classify imbalanced data, the problem remains open. If a method reflects the data distribution and removes noisy samples, then good classification results will be obtained. Therefore, this paper proposes a weighted ensemble algorithm based on differentiated sampling rates (KSDE) and apply it to the field of credit risk assessment. KSDE removes noisy samples using the outlier detection technique. Then, multiple balanced training subsets are generated to train submodels using differentiated sampling rates. These training subsets sufficiently represent the distribution of data. Finally, the well-performing submodels are weighted and integrated to obtain the prediction result. We conducted comprehensive experiments to validate the performance of the proposed method. Comparing 12 state-of-the-art methods on 23 datasets. KSDE outperforms the recently proposed SPE (Self-paced Ensemble) by 12.46% in terms of TPR (True Positive Rate). In addition, KSDE achieves good results on 7 credit risk datasets. The experimental results show that the proposed method is competitive in solving the imbalanced data classification problem.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"262 \",\"pages\":\"Article 125595\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S095741742402462X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095741742402462X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Weighted ensemble based on differentiated sampling rates for imbalanced classification and application to credit risk assessment
Imbalanced data classification is an important research topic in machine learning. The class imbalance problem has a great impact on the classification performance of the algorithm. In this research direction, proposing an effective sampling strategy for imbalanced data is a challenging task. Although a lot of methods have been proposed to classify imbalanced data, the problem remains open. If a method reflects the data distribution and removes noisy samples, then good classification results will be obtained. Therefore, this paper proposes a weighted ensemble algorithm based on differentiated sampling rates (KSDE) and apply it to the field of credit risk assessment. KSDE removes noisy samples using the outlier detection technique. Then, multiple balanced training subsets are generated to train submodels using differentiated sampling rates. These training subsets sufficiently represent the distribution of data. Finally, the well-performing submodels are weighted and integrated to obtain the prediction result. We conducted comprehensive experiments to validate the performance of the proposed method. Comparing 12 state-of-the-art methods on 23 datasets. KSDE outperforms the recently proposed SPE (Self-paced Ensemble) by 12.46% in terms of TPR (True Positive Rate). In addition, KSDE achieves good results on 7 credit risk datasets. The experimental results show that the proposed method is competitive in solving the imbalanced data classification problem.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.