{"title":"Corporate credit scoring method based on unlabeled data and multi-source data","authors":"Yunhong Xu, Yitong Chen, Li Sun, Yu Chen","doi":"10.1016/j.dss.2025.114543","DOIUrl":null,"url":null,"abstract":"<div><div>Unlabeled data and multi-source data provide unprecedented opportunities for the financial industry to improve credit scoring accuracy. When utilizing unlabeled data, existing credit scoring methods often suffer from unreliability issues due to improper clustering or the introduction of noise when predicting labels. When utilizing multi-source data, existing credit scoring methods based on federated learning frameworks fail to tailor models for different data distributions of different data sources due to the limitations of relying on a single global model. Moreover, recent studies have explored the individual value of unlabeled data and multi-source data, but they often fail to utilize both. To address these issues, we propose UMDCS (Unlabeled and Multi-Source data Driven Credit Scoring), a self-supervised credit scoring method that utilizes both unlabeled and multi-source data simultaneously. To utilize unlabeled data, we propose a novel sample masking function to generate pseudo-labels for unlabeled data and pre-train the encoder using the pretext tasks. To utilize multi-source data, we employ a horizontal federated learning framework to aggregate local encoders into a global model while preserving data privacy. The global encoder is concatenated with personalized predictors to form personalized credit scoring models for each data source. Five experiments and statistical significance tests show that UMDCS outperforms other baseline methods.</div></div>","PeriodicalId":55181,"journal":{"name":"Decision Support Systems","volume":"198 ","pages":"Article 114543"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Support Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167923625001447","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Unlabeled data and multi-source data provide unprecedented opportunities for the financial industry to improve credit scoring accuracy. When utilizing unlabeled data, existing credit scoring methods often suffer from unreliability issues due to improper clustering or the introduction of noise when predicting labels. When utilizing multi-source data, existing credit scoring methods based on federated learning frameworks fail to tailor models for different data distributions of different data sources due to the limitations of relying on a single global model. Moreover, recent studies have explored the individual value of unlabeled data and multi-source data, but they often fail to utilize both. To address these issues, we propose UMDCS (Unlabeled and Multi-Source data Driven Credit Scoring), a self-supervised credit scoring method that utilizes both unlabeled and multi-source data simultaneously. To utilize unlabeled data, we propose a novel sample masking function to generate pseudo-labels for unlabeled data and pre-train the encoder using the pretext tasks. To utilize multi-source data, we employ a horizontal federated learning framework to aggregate local encoders into a global model while preserving data privacy. The global encoder is concatenated with personalized predictors to form personalized credit scoring models for each data source. Five experiments and statistical significance tests show that UMDCS outperforms other baseline methods.
期刊介绍:
The common thread of articles published in Decision Support Systems is their relevance to theoretical and technical issues in the support of enhanced decision making. The areas addressed may include foundations, functionality, interfaces, implementation, impacts, and evaluation of decision support systems (DSSs).