保险和金融部门高度不平衡数据集分类的平衡欠袋集成方法

IF 3.7 Q1 Economics, Econometrics and Finance

Intelligent Systems in Accounting, Finance and Management Pub Date : 2025-10-13 DOI:10.1002/isaf.70018

Alberto Gutierrez-Gallego, Oscar Garnica, Daniel Parra, J. Manuel Velasco, J. Ignacio Hidalgo

{"title":"保险和金融部门高度不平衡数据集分类的平衡欠袋集成方法","authors":"Alberto Gutierrez-Gallego, Oscar Garnica, Daniel Parra, J. Manuel Velasco, J. Ignacio Hidalgo","doi":"10.1002/isaf.70018","DOIUrl":null,"url":null,"abstract":"<p>Data bias is a critical challenge in machine learning applications within the financial and insurance sectors, as it can lead to misleading risk assessments and inaccurate predictive models. A prevalent source of bias in real-world datasets is the imbalanced distribution of classes, which is particularly problematic in fraud detection, credit risk assessment, and claim prediction. Traditional approaches to handling imbalanced data often rely on undersampling or oversampling techniques. However, these methods may generate unrealistic minority class samples or fail to perform effectively when dealing with extreme class imbalances. In this paper, we propose a configurable technique based on the underbagging method, integrated with a classifier for highly imbalanced datasets. Our approach is designed to enhance the predictive accuracy of the minority class while maintaining robust performance for the majority class. We incorporate our methodology into a classification ensemble framework and evaluate its effectiveness by comparing it against 100 combinations of 10 different oversampling and undersampling techniques applied to 10 different machine learning algorithms. The evaluation is conducted on two highly imbalanced real-world datasets: one related to auto insurance claims and another focused on credit card fraud detection. Our statistical analysis demonstrates that Balanced Underbagged Ensemble achieves superior classification performance in terms of recall for both classes, regardless of the base machine learning model used within the ensemble. Furthermore, our method finds an optimal balance between classification performance and computational efficiency.</p>","PeriodicalId":53473,"journal":{"name":"Intelligent Systems in Accounting, Finance and Management","volume":"32 4","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/isaf.70018","citationCount":"0","resultStr":"{\"title\":\"Balanced Underbagged Ensemble Approach for Classifying Highly Imbalanced Datasets in the Insurance and Financial Sectors\",\"authors\":\"Alberto Gutierrez-Gallego, Oscar Garnica, Daniel Parra, J. Manuel Velasco, J. Ignacio Hidalgo\",\"doi\":\"10.1002/isaf.70018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Data bias is a critical challenge in machine learning applications within the financial and insurance sectors, as it can lead to misleading risk assessments and inaccurate predictive models. A prevalent source of bias in real-world datasets is the imbalanced distribution of classes, which is particularly problematic in fraud detection, credit risk assessment, and claim prediction. Traditional approaches to handling imbalanced data often rely on undersampling or oversampling techniques. However, these methods may generate unrealistic minority class samples or fail to perform effectively when dealing with extreme class imbalances. In this paper, we propose a configurable technique based on the underbagging method, integrated with a classifier for highly imbalanced datasets. Our approach is designed to enhance the predictive accuracy of the minority class while maintaining robust performance for the majority class. We incorporate our methodology into a classification ensemble framework and evaluate its effectiveness by comparing it against 100 combinations of 10 different oversampling and undersampling techniques applied to 10 different machine learning algorithms. The evaluation is conducted on two highly imbalanced real-world datasets: one related to auto insurance claims and another focused on credit card fraud detection. Our statistical analysis demonstrates that Balanced Underbagged Ensemble achieves superior classification performance in terms of recall for both classes, regardless of the base machine learning model used within the ensemble. Furthermore, our method finds an optimal balance between classification performance and computational efficiency.</p>\",\"PeriodicalId\":53473,\"journal\":{\"name\":\"Intelligent Systems in Accounting, Finance and Management\",\"volume\":\"32 4\",\"pages\":\"\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/isaf.70018\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Systems in Accounting, Finance and Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/isaf.70018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Economics, Econometrics and Finance\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Systems in Accounting, Finance and Management","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/isaf.70018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Economics, Econometrics and Finance","Score":null,"Total":0}

引用次数: 0

摘要

数据偏差是金融和保险行业机器学习应用中的一个关键挑战，因为它可能导致误导性的风险评估和不准确的预测模型。在现实世界的数据集中，一个普遍的偏差来源是类别分布的不平衡，这在欺诈检测、信用风险评估和索赔预测中尤其有问题。处理不平衡数据的传统方法通常依赖于欠采样或过采样技术。然而，这些方法可能会产生不现实的少数阶级样本，或者在处理极端的阶级不平衡时不能有效地执行。在本文中，我们提出了一种基于underbagging方法的可配置技术，该技术与高度不平衡数据集的分类器相结合。我们的方法旨在提高少数类的预测准确性，同时保持多数类的稳健性能。我们将我们的方法纳入分类集成框架，并通过将其与应用于10种不同机器学习算法的10种不同过采样和欠采样技术的100种组合进行比较来评估其有效性。评估是在两个高度不平衡的真实世界数据集上进行的：一个与汽车保险索赔有关，另一个专注于信用卡欺诈检测。我们的统计分析表明，无论集成中使用的基本机器学习模型如何，平衡的Underbagged集成在两个类的召回率方面都取得了卓越的分类性能。此外，我们的方法在分类性能和计算效率之间找到了最佳平衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Balanced Underbagged Ensemble Approach for Classifying Highly Imbalanced Datasets in the Insurance and Financial Sectors

查看原文本刊更多论文

Balanced Underbagged Ensemble Approach for Classifying Highly Imbalanced Datasets in the Insurance and Financial Sectors

Data bias is a critical challenge in machine learning applications within the financial and insurance sectors, as it can lead to misleading risk assessments and inaccurate predictive models. A prevalent source of bias in real-world datasets is the imbalanced distribution of classes, which is particularly problematic in fraud detection, credit risk assessment, and claim prediction. Traditional approaches to handling imbalanced data often rely on undersampling or oversampling techniques. However, these methods may generate unrealistic minority class samples or fail to perform effectively when dealing with extreme class imbalances. In this paper, we propose a configurable technique based on the underbagging method, integrated with a classifier for highly imbalanced datasets. Our approach is designed to enhance the predictive accuracy of the minority class while maintaining robust performance for the majority class. We incorporate our methodology into a classification ensemble framework and evaluate its effectiveness by comparing it against 100 combinations of 10 different oversampling and undersampling techniques applied to 10 different machine learning algorithms. The evaluation is conducted on two highly imbalanced real-world datasets: one related to auto insurance claims and another focused on credit card fraud detection. Our statistical analysis demonstrates that Balanced Underbagged Ensemble achieves superior classification performance in terms of recall for both classes, regardless of the base machine learning model used within the ensemble. Furthermore, our method finds an optimal balance between classification performance and computational efficiency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Intelligent Systems in Accounting, Finance and Management Economics, Econometrics and Finance-Finance

CiteScore

6.00

自引率

0.00%

发文量

期刊介绍： Intelligent Systems in Accounting, Finance and Management is a quarterly international journal which publishes original, high quality material dealing with all aspects of intelligent systems as they relate to the fields of accounting, economics, finance, marketing and management. In addition, the journal also is concerned with related emerging technologies, including big data, business intelligence, social media and other technologies. It encourages the development of novel technologies, and the embedding of new and existing technologies into applications of real, practical value. Therefore, implementation issues are of as much concern as development issues. The journal is designed to appeal to academics in the intelligent systems, emerging technologies and business fields, as well as to advanced practitioners who wish to improve the effectiveness, efficiency, or economy of their working practices. A special feature of the journal is the use of two groups of reviewers, those who specialize in intelligent systems work, and also those who specialize in applications areas. Reviewers are asked to address issues of originality and actual or potential impact on research, teaching, or practice in the accounting, finance, or management fields. Authors working on conceptual developments or on laboratory-based explorations of data sets therefore need to address the issue of potential impact at some level in submissions to the journal.