{"title":"金融科技贷款的信用评分模型:大型语言模型和FocalPoly损失的集成","authors":"Yufei Xia , Zhiyin Han , Yawen Li , Lingyun He","doi":"10.1016/j.ijforecast.2024.07.005","DOIUrl":null,"url":null,"abstract":"<div><div>Fintech lending experiences high credit risk and needs an efficient credit scoring model, but it also faces limited data sources and severe class imbalance. We develop a novel two-stage credit scoring model (called LLM-FP-CatBoost) by solving the two issues simultaneously. Large language models (LLMs) initially extract narrative data as a supplementary credit dataset. A new FocalPoly loss is then incorporated with CatBoost to handle the class imbalance problem. Extensive comparisons demonstrate that the proposed LLM-FP-CatBoost significantly outperforms the benchmarks in most circumstances. When making pairwise comparisons between LLMs on the fintech lending dataset, we found that the Chinese-specific LLM, i.e., ERNIE 4.0, achieves the best overall performance, followed by GPT-4 and BERT-based models. The performance decomposition reveals that the superiority is mainly attributed to the new data source extracted by the LLMs. The SHAP algorithm further ensures the interpretability of LLM-FP-CatBoost. The superiority of the proposed LLM-FP-CatBoost model remains robust to hyperparameters of the loss function, specific LLMs, and other extraction methods of narrative data. Finally, we discuss some managerial implications concerning credit scoring in fintech lending.</div></div>","PeriodicalId":14061,"journal":{"name":"International Journal of Forecasting","volume":"41 3","pages":"Pages 894-919"},"PeriodicalIF":6.9000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Credit scoring model for fintech lending: An integration of large language models and FocalPoly loss\",\"authors\":\"Yufei Xia , Zhiyin Han , Yawen Li , Lingyun He\",\"doi\":\"10.1016/j.ijforecast.2024.07.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Fintech lending experiences high credit risk and needs an efficient credit scoring model, but it also faces limited data sources and severe class imbalance. We develop a novel two-stage credit scoring model (called LLM-FP-CatBoost) by solving the two issues simultaneously. Large language models (LLMs) initially extract narrative data as a supplementary credit dataset. A new FocalPoly loss is then incorporated with CatBoost to handle the class imbalance problem. Extensive comparisons demonstrate that the proposed LLM-FP-CatBoost significantly outperforms the benchmarks in most circumstances. When making pairwise comparisons between LLMs on the fintech lending dataset, we found that the Chinese-specific LLM, i.e., ERNIE 4.0, achieves the best overall performance, followed by GPT-4 and BERT-based models. The performance decomposition reveals that the superiority is mainly attributed to the new data source extracted by the LLMs. The SHAP algorithm further ensures the interpretability of LLM-FP-CatBoost. The superiority of the proposed LLM-FP-CatBoost model remains robust to hyperparameters of the loss function, specific LLMs, and other extraction methods of narrative data. Finally, we discuss some managerial implications concerning credit scoring in fintech lending.</div></div>\",\"PeriodicalId\":14061,\"journal\":{\"name\":\"International Journal of Forecasting\",\"volume\":\"41 3\",\"pages\":\"Pages 894-919\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2024-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Forecasting\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169207024000724\",\"RegionNum\":2,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Forecasting","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169207024000724","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
Credit scoring model for fintech lending: An integration of large language models and FocalPoly loss
Fintech lending experiences high credit risk and needs an efficient credit scoring model, but it also faces limited data sources and severe class imbalance. We develop a novel two-stage credit scoring model (called LLM-FP-CatBoost) by solving the two issues simultaneously. Large language models (LLMs) initially extract narrative data as a supplementary credit dataset. A new FocalPoly loss is then incorporated with CatBoost to handle the class imbalance problem. Extensive comparisons demonstrate that the proposed LLM-FP-CatBoost significantly outperforms the benchmarks in most circumstances. When making pairwise comparisons between LLMs on the fintech lending dataset, we found that the Chinese-specific LLM, i.e., ERNIE 4.0, achieves the best overall performance, followed by GPT-4 and BERT-based models. The performance decomposition reveals that the superiority is mainly attributed to the new data source extracted by the LLMs. The SHAP algorithm further ensures the interpretability of LLM-FP-CatBoost. The superiority of the proposed LLM-FP-CatBoost model remains robust to hyperparameters of the loss function, specific LLMs, and other extraction methods of narrative data. Finally, we discuss some managerial implications concerning credit scoring in fintech lending.
期刊介绍:
The International Journal of Forecasting is a leading journal in its field that publishes high quality refereed papers. It aims to bridge the gap between theory and practice, making forecasting useful and relevant for decision and policy makers. The journal places strong emphasis on empirical studies, evaluation activities, implementation research, and improving the practice of forecasting. It welcomes various points of view and encourages debate to find solutions to field-related problems. The journal is the official publication of the International Institute of Forecasters (IIF) and is indexed in Sociological Abstracts, Journal of Economic Literature, Statistical Theory and Method Abstracts, INSPEC, Current Contents, UMI Data Courier, RePEc, Academic Journal Guide, CIS, IAOR, and Social Sciences Citation Index.