{"title":"A Model Based on Survival-based Credit Risk Assessment System of SMEs","authors":"Jia Chen, Chunjie Wang, R. Leone","doi":"10.1145/3547578.3547615","DOIUrl":null,"url":null,"abstract":"Assessment of the credit risk of small and medium-sized enterprises (SMEs) based on their strength and reputation is very important for the banks as the basis for credit decisions. We establish a model of credit risk assessment system for SMEs based on Nonparametric Maximum Likelihood Estimation (NPMLE) of survival function which addresses the Case Ⅱ interval-censored data. An empirical analysis of a real data set about 425 Chinese SMEs is carried out. Firstly, the data information of 425 SMEs are preprocessed and three main factors are considered: enterprise strength, development potential, situation of supply and demand relationship of upstream and downstream. Then we extract appropriate features based on those factors. The association between credit risk of SMEs and the extracted features is discussed and variable selection is implemented by Random Forest. What's more, the prediction of the credit risk is carried out by Double Random Forest (DRF), which is a competitive ensemble method in classification and prediction. The predictive accuracy is evaluated by ROC curves and confusion matrices, and the outcomes show that DRF outperforms Support Vector Machine (SVM) and Random Forest (RF). Combining the outcomes of prediction with the features, the intervals into which the repayment time fall are obtained, and hence the data set is Case Ⅱ interval-censored, i.e., the exact time of repayment is unknown and we only know that the repayment time falls into an interval. The issue of risk assessments based on Case Ⅱ interval-censored has received little attentions in the literature. Here the case of prepayment, on-time and late payment are all under discussion. The NPMLE is applied to estimate the probability of repayment varying with time. The survival curves of SMEs with different credit rating are drawn, thus establishing a credit risk assessment system for SMEs.","PeriodicalId":381600,"journal":{"name":"Proceedings of the 14th International Conference on Computer Modeling and Simulation","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th International Conference on Computer Modeling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3547578.3547615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Assessment of the credit risk of small and medium-sized enterprises (SMEs) based on their strength and reputation is very important for the banks as the basis for credit decisions. We establish a model of credit risk assessment system for SMEs based on Nonparametric Maximum Likelihood Estimation (NPMLE) of survival function which addresses the Case Ⅱ interval-censored data. An empirical analysis of a real data set about 425 Chinese SMEs is carried out. Firstly, the data information of 425 SMEs are preprocessed and three main factors are considered: enterprise strength, development potential, situation of supply and demand relationship of upstream and downstream. Then we extract appropriate features based on those factors. The association between credit risk of SMEs and the extracted features is discussed and variable selection is implemented by Random Forest. What's more, the prediction of the credit risk is carried out by Double Random Forest (DRF), which is a competitive ensemble method in classification and prediction. The predictive accuracy is evaluated by ROC curves and confusion matrices, and the outcomes show that DRF outperforms Support Vector Machine (SVM) and Random Forest (RF). Combining the outcomes of prediction with the features, the intervals into which the repayment time fall are obtained, and hence the data set is Case Ⅱ interval-censored, i.e., the exact time of repayment is unknown and we only know that the repayment time falls into an interval. The issue of risk assessments based on Case Ⅱ interval-censored has received little attentions in the literature. Here the case of prepayment, on-time and late payment are all under discussion. The NPMLE is applied to estimate the probability of repayment varying with time. The survival curves of SMEs with different credit rating are drawn, thus establishing a credit risk assessment system for SMEs.
基于中小企业的实力和声誉对中小企业的信用风险进行评估是银行信贷决策的重要依据。针对Ⅱ区间剔除数据的情况,建立了基于生存函数非参数极大似然估计(NPMLE)的中小企业信用风险评估系统模型。本文对425家中国中小企业的真实数据进行了实证分析。首先对425家中小企业的数据信息进行预处理,主要考虑企业实力、发展潜力、上下游供需关系状况三个因素。然后根据这些因素提取相应的特征。讨论了中小企业信用风险与提取特征之间的关系,并采用随机森林方法进行变量选择。采用双随机森林(Double Random Forest, DRF)对信用风险进行预测,这是一种分类和预测的竞争集成方法。通过ROC曲线和混淆矩阵对预测精度进行评估,结果表明DRF优于支持向量机(SVM)和随机森林(RF)。将预测结果与特征相结合,得到还款时间落在哪个区间,因此数据集为CaseⅡinterval- censorship,即还款的确切时间未知,我们只知道还款时间落在某个区间。基于案例Ⅱ间隔审查的风险评估问题在文献中很少受到关注。在这里,提前付款、按时付款和延期付款都在讨论之中。应用NPMLE来估计随时间变化的还款概率。绘制不同信用等级的中小企业的生存曲线,从而建立中小企业信用风险评估体系。