重抽样方法和分类模型对不平衡信用评分问题的影响

IF 6.8 1区 计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS
Jin Xiao , Yadong Wang , Jing Chen , Ling Xie , Jing Huang
{"title":"重抽样方法和分类模型对不平衡信用评分问题的影响","authors":"Jin Xiao ,&nbsp;Yadong Wang ,&nbsp;Jing Chen ,&nbsp;Ling Xie ,&nbsp;Jing Huang","doi":"10.1016/j.ins.2021.05.029","DOIUrl":null,"url":null,"abstract":"<div><p>For imbalanced credit scoring, the most common solution is to balance the class distribution of the training set with a resampling method<span><span><span>, and then train a classification model and classify the customer samples in the test set. However, it is still difficult to select the most appropriate resampling methods and classification models, and the optimal combinations of them have not been identified. Therefore, this study proposes a new </span>benchmark models comparison framework for imbalanced credit scoring. In the framework, we introduce the index of balanced accuracy and four other evaluation measures, experimentally compare the performance of 10 benchmark resampling methods and nine benchmark classification models respectively on six credit scoring data sets, and analyze the optimal combinations of them. The experimental result shows: (1) as for benchmark resampling methods, random under-sampling (a traditional resampling method) and synthetic minority over-sampling technique combined with Wilson’s edited nearest neighbor (an intelligent resampling method) present the best performance; (2) as for benchmark classification models, logistic regression (a single classification model) and </span>adaptive boosting (an ensemble classification model) present the best performance; (3) as for optimal combinations, random under-sampling combined with random subspace (an ensemble classification model) can obtain the most satisfactory credit scoring performance.</span></p></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"569 ","pages":"Pages 508-526"},"PeriodicalIF":6.8000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.ins.2021.05.029","citationCount":"25","resultStr":"{\"title\":\"Impact of resampling methods and classification models on the imbalanced credit scoring problems\",\"authors\":\"Jin Xiao ,&nbsp;Yadong Wang ,&nbsp;Jing Chen ,&nbsp;Ling Xie ,&nbsp;Jing Huang\",\"doi\":\"10.1016/j.ins.2021.05.029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>For imbalanced credit scoring, the most common solution is to balance the class distribution of the training set with a resampling method<span><span><span>, and then train a classification model and classify the customer samples in the test set. However, it is still difficult to select the most appropriate resampling methods and classification models, and the optimal combinations of them have not been identified. Therefore, this study proposes a new </span>benchmark models comparison framework for imbalanced credit scoring. In the framework, we introduce the index of balanced accuracy and four other evaluation measures, experimentally compare the performance of 10 benchmark resampling methods and nine benchmark classification models respectively on six credit scoring data sets, and analyze the optimal combinations of them. The experimental result shows: (1) as for benchmark resampling methods, random under-sampling (a traditional resampling method) and synthetic minority over-sampling technique combined with Wilson’s edited nearest neighbor (an intelligent resampling method) present the best performance; (2) as for benchmark classification models, logistic regression (a single classification model) and </span>adaptive boosting (an ensemble classification model) present the best performance; (3) as for optimal combinations, random under-sampling combined with random subspace (an ensemble classification model) can obtain the most satisfactory credit scoring performance.</span></p></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"569 \",\"pages\":\"Pages 508-526\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2021-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1016/j.ins.2021.05.029\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025521004874\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025521004874","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 25

摘要

对于不平衡信用评分,最常见的解决方案是用重采样的方法平衡训练集的类分布,然后训练一个分类模型,对测试集中的客户样本进行分类。然而,选择最合适的重采样方法和分类模型仍然很困难,它们的最优组合也没有确定。因此,本研究提出了一种新的非均衡信用评分的基准模型比较框架。在该框架中,引入平衡精度指标等4个评价指标,实验比较了10种基准重采样方法和9种基准分类模型在6个信用评分数据集上的性能,并分析了它们的最优组合。实验结果表明:(1)在基准重采样方法中,随机欠采样(传统重采样方法)和合成少数过采样技术结合威尔逊编辑最近邻(智能重采样方法)表现出最好的性能;(2)在基准分类模型中,逻辑回归(单一分类模型)和自适应增强(集成分类模型)表现最好;(3)对于最优组合,随机欠采样与随机子空间(一种集成分类模型)相结合可以获得最满意的信用评分性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Impact of resampling methods and classification models on the imbalanced credit scoring problems

For imbalanced credit scoring, the most common solution is to balance the class distribution of the training set with a resampling method, and then train a classification model and classify the customer samples in the test set. However, it is still difficult to select the most appropriate resampling methods and classification models, and the optimal combinations of them have not been identified. Therefore, this study proposes a new benchmark models comparison framework for imbalanced credit scoring. In the framework, we introduce the index of balanced accuracy and four other evaluation measures, experimentally compare the performance of 10 benchmark resampling methods and nine benchmark classification models respectively on six credit scoring data sets, and analyze the optimal combinations of them. The experimental result shows: (1) as for benchmark resampling methods, random under-sampling (a traditional resampling method) and synthetic minority over-sampling technique combined with Wilson’s edited nearest neighbor (an intelligent resampling method) present the best performance; (2) as for benchmark classification models, logistic regression (a single classification model) and adaptive boosting (an ensemble classification model) present the best performance; (3) as for optimal combinations, random under-sampling combined with random subspace (an ensemble classification model) can obtain the most satisfactory credit scoring performance.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Sciences
Information Sciences 工程技术-计算机:信息系统
CiteScore
14.00
自引率
17.30%
发文量
1322
审稿时长
10.4 months
期刊介绍: Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信