Sebastian H. Goldmann, Marcos R. Machado, Joerg R. Osterrieder
{"title":"Advancing credit risk assessment in the retail banking industry: A hybrid approach using time series and supervised learning models","authors":"Sebastian H. Goldmann, Marcos R. Machado, Joerg R. Osterrieder","doi":"10.1016/j.datak.2025.102490","DOIUrl":null,"url":null,"abstract":"<div><div>Credit risk assessment remains a central challenge in retail banking, with conventional models often falling short in predictive accuracy and adaptability to granular customer behavior. This study explores the potential of Time Series Classification (TSC) algorithms to enhance credit risk modeling by analyzing customers’ historical end-of-day balance data. We compare traditional Machine Learning (ML) models – including Logistic Regression and XGBoost – with advanced TSC methods such as Shapelets, Long Short-Term Memory (LSTM) networks, and Canonical Interval Forests (CIF). Our results show that TSC algorithms, particularly CIF and Shapelet-based methods, significantly outperform traditional approaches. When using CIF-derived Probability of Default (PD) estimates as additional features in an XGBoost model, predictive performance improved notably: the combined model achieved an Area under the Curve (AUC) of 0.81, compared to 0.79 for CIF alone and 0.77 for XGBoost without the CIF input. These findings underscore the value of integrating temporal features into credit risk assessment frameworks. Moreover, the complementary strengths of the TSC and XGBoost models across different Receiver Operating Characteristic (ROC) curve regions demonstrate the practical benefits of model stacking. However, performance dropped when using aggregated monthly data, highlighting the importance of preserving high-frequency behavioral signals. This research contributes to more accurate, interpretable, and robust credit risk models and offers a pathway for banks to leverage time series data for improved risk forecasting.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102490"},"PeriodicalIF":2.7000,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X25000850","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Credit risk assessment remains a central challenge in retail banking, with conventional models often falling short in predictive accuracy and adaptability to granular customer behavior. This study explores the potential of Time Series Classification (TSC) algorithms to enhance credit risk modeling by analyzing customers’ historical end-of-day balance data. We compare traditional Machine Learning (ML) models – including Logistic Regression and XGBoost – with advanced TSC methods such as Shapelets, Long Short-Term Memory (LSTM) networks, and Canonical Interval Forests (CIF). Our results show that TSC algorithms, particularly CIF and Shapelet-based methods, significantly outperform traditional approaches. When using CIF-derived Probability of Default (PD) estimates as additional features in an XGBoost model, predictive performance improved notably: the combined model achieved an Area under the Curve (AUC) of 0.81, compared to 0.79 for CIF alone and 0.77 for XGBoost without the CIF input. These findings underscore the value of integrating temporal features into credit risk assessment frameworks. Moreover, the complementary strengths of the TSC and XGBoost models across different Receiver Operating Characteristic (ROC) curve regions demonstrate the practical benefits of model stacking. However, performance dropped when using aggregated monthly data, highlighting the importance of preserving high-frequency behavioral signals. This research contributes to more accurate, interpretable, and robust credit risk models and offers a pathway for banks to leverage time series data for improved risk forecasting.
期刊介绍:
Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.