{"title":"基于数据驱动的P2P网络借贷违约风险预测方法","authors":"Yu Jin, Yu Zhu","doi":"10.1109/CSNT.2015.25","DOIUrl":null,"url":null,"abstract":"Online Peer-to-Peer (P2P) lending has achieved explosive development recently, which could be beneficial to both sides of individual lending. In this study, a data mining (DM) approach to predict the performance of P2P loan before funded is proposed. Using data from the Lending Club, we explore the characteristics of loan and its applicant and use random forest to do the feature selection in the modeling phase. The Difference from other risk prediction models is that the prediction is classified into three or four categories, rather than just two the default and not default classes. Then we compare five DM models: two decision trees (DTs), two neural networks (NNs) and one support vector machine (SVM) and use two metrics: average percent hit rate and area of the lift cumulative curve to evaluate the prediction results. The Empirical result shows that the term of loan, annual income, the amount of loan, debt-to-income ratio, credit grade and revolving line utilization play an important role in loan defaults. And SVM, Classification and Regression Tree (CART) and Multi-layer perceptron (MPL)'s prediction performance are almost equal.","PeriodicalId":334733,"journal":{"name":"2015 Fifth International Conference on Communication Systems and Network Technologies","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"61","resultStr":"{\"title\":\"A Data-Driven Approach to Predict Default Risk of Loan for Online Peer-to-Peer (P2P) Lending\",\"authors\":\"Yu Jin, Yu Zhu\",\"doi\":\"10.1109/CSNT.2015.25\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online Peer-to-Peer (P2P) lending has achieved explosive development recently, which could be beneficial to both sides of individual lending. In this study, a data mining (DM) approach to predict the performance of P2P loan before funded is proposed. Using data from the Lending Club, we explore the characteristics of loan and its applicant and use random forest to do the feature selection in the modeling phase. The Difference from other risk prediction models is that the prediction is classified into three or four categories, rather than just two the default and not default classes. Then we compare five DM models: two decision trees (DTs), two neural networks (NNs) and one support vector machine (SVM) and use two metrics: average percent hit rate and area of the lift cumulative curve to evaluate the prediction results. The Empirical result shows that the term of loan, annual income, the amount of loan, debt-to-income ratio, credit grade and revolving line utilization play an important role in loan defaults. And SVM, Classification and Regression Tree (CART) and Multi-layer perceptron (MPL)'s prediction performance are almost equal.\",\"PeriodicalId\":334733,\"journal\":{\"name\":\"2015 Fifth International Conference on Communication Systems and Network Technologies\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"61\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Fifth International Conference on Communication Systems and Network Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSNT.2015.25\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Fifth International Conference on Communication Systems and Network Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSNT.2015.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Data-Driven Approach to Predict Default Risk of Loan for Online Peer-to-Peer (P2P) Lending
Online Peer-to-Peer (P2P) lending has achieved explosive development recently, which could be beneficial to both sides of individual lending. In this study, a data mining (DM) approach to predict the performance of P2P loan before funded is proposed. Using data from the Lending Club, we explore the characteristics of loan and its applicant and use random forest to do the feature selection in the modeling phase. The Difference from other risk prediction models is that the prediction is classified into three or four categories, rather than just two the default and not default classes. Then we compare five DM models: two decision trees (DTs), two neural networks (NNs) and one support vector machine (SVM) and use two metrics: average percent hit rate and area of the lift cumulative curve to evaluate the prediction results. The Empirical result shows that the term of loan, annual income, the amount of loan, debt-to-income ratio, credit grade and revolving line utilization play an important role in loan defaults. And SVM, Classification and Regression Tree (CART) and Multi-layer perceptron (MPL)'s prediction performance are almost equal.