金融科技贷款的信用评分模型:大型语言模型和FocalPoly损失的集成

IF 6.9 2区 经济学 Q1 ECONOMICS
Yufei Xia , Zhiyin Han , Yawen Li , Lingyun He
{"title":"金融科技贷款的信用评分模型:大型语言模型和FocalPoly损失的集成","authors":"Yufei Xia ,&nbsp;Zhiyin Han ,&nbsp;Yawen Li ,&nbsp;Lingyun He","doi":"10.1016/j.ijforecast.2024.07.005","DOIUrl":null,"url":null,"abstract":"<div><div>Fintech lending experiences high credit risk and needs an efficient credit scoring model, but it also faces limited data sources and severe class imbalance. We develop a novel two-stage credit scoring model (called LLM-FP-CatBoost) by solving the two issues simultaneously. Large language models (LLMs) initially extract narrative data as a supplementary credit dataset. A new FocalPoly loss is then incorporated with CatBoost to handle the class imbalance problem. Extensive comparisons demonstrate that the proposed LLM-FP-CatBoost significantly outperforms the benchmarks in most circumstances. When making pairwise comparisons between LLMs on the fintech lending dataset, we found that the Chinese-specific LLM, i.e., ERNIE 4.0, achieves the best overall performance, followed by GPT-4 and BERT-based models. The performance decomposition reveals that the superiority is mainly attributed to the new data source extracted by the LLMs. The SHAP algorithm further ensures the interpretability of LLM-FP-CatBoost. The superiority of the proposed LLM-FP-CatBoost model remains robust to hyperparameters of the loss function, specific LLMs, and other extraction methods of narrative data. Finally, we discuss some managerial implications concerning credit scoring in fintech lending.</div></div>","PeriodicalId":14061,"journal":{"name":"International Journal of Forecasting","volume":"41 3","pages":"Pages 894-919"},"PeriodicalIF":6.9000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Credit scoring model for fintech lending: An integration of large language models and FocalPoly loss\",\"authors\":\"Yufei Xia ,&nbsp;Zhiyin Han ,&nbsp;Yawen Li ,&nbsp;Lingyun He\",\"doi\":\"10.1016/j.ijforecast.2024.07.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Fintech lending experiences high credit risk and needs an efficient credit scoring model, but it also faces limited data sources and severe class imbalance. We develop a novel two-stage credit scoring model (called LLM-FP-CatBoost) by solving the two issues simultaneously. Large language models (LLMs) initially extract narrative data as a supplementary credit dataset. A new FocalPoly loss is then incorporated with CatBoost to handle the class imbalance problem. Extensive comparisons demonstrate that the proposed LLM-FP-CatBoost significantly outperforms the benchmarks in most circumstances. When making pairwise comparisons between LLMs on the fintech lending dataset, we found that the Chinese-specific LLM, i.e., ERNIE 4.0, achieves the best overall performance, followed by GPT-4 and BERT-based models. The performance decomposition reveals that the superiority is mainly attributed to the new data source extracted by the LLMs. The SHAP algorithm further ensures the interpretability of LLM-FP-CatBoost. The superiority of the proposed LLM-FP-CatBoost model remains robust to hyperparameters of the loss function, specific LLMs, and other extraction methods of narrative data. Finally, we discuss some managerial implications concerning credit scoring in fintech lending.</div></div>\",\"PeriodicalId\":14061,\"journal\":{\"name\":\"International Journal of Forecasting\",\"volume\":\"41 3\",\"pages\":\"Pages 894-919\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2024-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Forecasting\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0169207024000724\",\"RegionNum\":2,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Forecasting","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169207024000724","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

摘要

金融科技借贷信用风险高,需要高效的信用评分模型,但数据来源有限,阶层失衡严重。我们通过同时解决这两个问题,开发了一种新的两阶段信用评分模型(称为LLM-FP-CatBoost)。大型语言模型(llm)最初提取叙事数据作为补充信用数据集。然后将新的FocalPoly损失与CatBoost结合起来处理类不平衡问题。大量的比较表明,LLM-FP-CatBoost在大多数情况下都明显优于基准测试。在对金融科技借贷数据集上的法学模型进行两两比较时,我们发现中国特有的法学模型,即ERNIE 4.0,总体表现最佳,其次是GPT-4和基于bert的模型。性能分解表明,这种优势主要归功于llm提取的新数据源。SHAP算法进一步保证了LLM-FP-CatBoost的可解释性。所提出的LLM-FP-CatBoost模型对损失函数的超参数、特定llm和其他叙事数据提取方法仍然具有鲁棒性。最后,我们讨论了金融科技贷款中信用评分的一些管理含义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Credit scoring model for fintech lending: An integration of large language models and FocalPoly loss
Fintech lending experiences high credit risk and needs an efficient credit scoring model, but it also faces limited data sources and severe class imbalance. We develop a novel two-stage credit scoring model (called LLM-FP-CatBoost) by solving the two issues simultaneously. Large language models (LLMs) initially extract narrative data as a supplementary credit dataset. A new FocalPoly loss is then incorporated with CatBoost to handle the class imbalance problem. Extensive comparisons demonstrate that the proposed LLM-FP-CatBoost significantly outperforms the benchmarks in most circumstances. When making pairwise comparisons between LLMs on the fintech lending dataset, we found that the Chinese-specific LLM, i.e., ERNIE 4.0, achieves the best overall performance, followed by GPT-4 and BERT-based models. The performance decomposition reveals that the superiority is mainly attributed to the new data source extracted by the LLMs. The SHAP algorithm further ensures the interpretability of LLM-FP-CatBoost. The superiority of the proposed LLM-FP-CatBoost model remains robust to hyperparameters of the loss function, specific LLMs, and other extraction methods of narrative data. Finally, we discuss some managerial implications concerning credit scoring in fintech lending.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
17.10
自引率
11.40%
发文量
189
审稿时长
77 days
期刊介绍: The International Journal of Forecasting is a leading journal in its field that publishes high quality refereed papers. It aims to bridge the gap between theory and practice, making forecasting useful and relevant for decision and policy makers. The journal places strong emphasis on empirical studies, evaluation activities, implementation research, and improving the practice of forecasting. It welcomes various points of view and encourages debate to find solutions to field-related problems. The journal is the official publication of the International Institute of Forecasters (IIF) and is indexed in Sociological Abstracts, Journal of Economic Literature, Statistical Theory and Method Abstracts, INSPEC, Current Contents, UMI Data Courier, RePEc, Academic Journal Guide, CIS, IAOR, and Social Sciences Citation Index.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信