Jin Xiao , Yadong Wang , Jing Chen , Ling Xie , Jing Huang
{"title":"Impact of resampling methods and classification models on the imbalanced credit scoring problems","authors":"Jin Xiao , Yadong Wang , Jing Chen , Ling Xie , Jing Huang","doi":"10.1016/j.ins.2021.05.029","DOIUrl":null,"url":null,"abstract":"<div><p>For imbalanced credit scoring, the most common solution is to balance the class distribution of the training set with a resampling method<span><span><span>, and then train a classification model and classify the customer samples in the test set. However, it is still difficult to select the most appropriate resampling methods and classification models, and the optimal combinations of them have not been identified. Therefore, this study proposes a new </span>benchmark models comparison framework for imbalanced credit scoring. In the framework, we introduce the index of balanced accuracy and four other evaluation measures, experimentally compare the performance of 10 benchmark resampling methods and nine benchmark classification models respectively on six credit scoring data sets, and analyze the optimal combinations of them. The experimental result shows: (1) as for benchmark resampling methods, random under-sampling (a traditional resampling method) and synthetic minority over-sampling technique combined with Wilson’s edited nearest neighbor (an intelligent resampling method) present the best performance; (2) as for benchmark classification models, logistic regression (a single classification model) and </span>adaptive boosting (an ensemble classification model) present the best performance; (3) as for optimal combinations, random under-sampling combined with random subspace (an ensemble classification model) can obtain the most satisfactory credit scoring performance.</span></p></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"569 ","pages":"Pages 508-526"},"PeriodicalIF":6.8000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.ins.2021.05.029","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025521004874","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 25
Abstract
For imbalanced credit scoring, the most common solution is to balance the class distribution of the training set with a resampling method, and then train a classification model and classify the customer samples in the test set. However, it is still difficult to select the most appropriate resampling methods and classification models, and the optimal combinations of them have not been identified. Therefore, this study proposes a new benchmark models comparison framework for imbalanced credit scoring. In the framework, we introduce the index of balanced accuracy and four other evaluation measures, experimentally compare the performance of 10 benchmark resampling methods and nine benchmark classification models respectively on six credit scoring data sets, and analyze the optimal combinations of them. The experimental result shows: (1) as for benchmark resampling methods, random under-sampling (a traditional resampling method) and synthetic minority over-sampling technique combined with Wilson’s edited nearest neighbor (an intelligent resampling method) present the best performance; (2) as for benchmark classification models, logistic regression (a single classification model) and adaptive boosting (an ensemble classification model) present the best performance; (3) as for optimal combinations, random under-sampling combined with random subspace (an ensemble classification model) can obtain the most satisfactory credit scoring performance.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.