利用不同的机器学习算法进行破产预测

2022 International Conference on Frontiers of Information Technology (FIT) Pub Date : 2022-12-01 DOI:10.1109/FIT57066.2022.00029

Ahmad Hassan, Nazish Yousaf

{"title":"利用不同的机器学习算法进行破产预测","authors":"Ahmad Hassan, Nazish Yousaf","doi":"10.1109/FIT57066.2022.00029","DOIUrl":null,"url":null,"abstract":"Bankruptcy prediction is a large field of finance and accounting science. The importance of this field stems from determining the risk to a business’s stability. Financial instability prediction aims to develop a predictive model that incorporates multiple econometric parameters to forecast a firm’s financial situation in the future. This research documents our observations while exploring, constructing, and comparing some of the widely used classification models: extreme gradient boosting, decision trees, random forests, quadratic discriminant analysis, neural networks, adaptive boosting, gaussian naïve bayes, balanced bagging, and logistic regression, which apply to bankruptcy prediction. Our focus is on the bankruptcy dataset of Polish companies, where statistical features are curated collections created using synthetic features. We start by performing data preprocessing and exploratory analysis involving the imputation of missing values using popular imputation techniques such as mean, k-nearest neighbours (K-NN), expectation maximization (EM), and multivariate imputation by chained equations, also known as (MICE). To address the data imbalance issue, we oversample the minority class labels using the synthetic minority oversampling approach (SMOTE). Then the data modelling is done using k-fold cross validation on the aforementioned models and the imputed and resampled datasets. Finally, we get 36 different analyses for nine models on four different imputed datasets, then we assess and evaluate the models’ performance on the validation datasets and rank the models accordingly.","PeriodicalId":102958,"journal":{"name":"2022 International Conference on Frontiers of Information Technology (FIT)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bankruptcy Prediction using Diverse Machine Learning Algorithms\",\"authors\":\"Ahmad Hassan, Nazish Yousaf\",\"doi\":\"10.1109/FIT57066.2022.00029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bankruptcy prediction is a large field of finance and accounting science. The importance of this field stems from determining the risk to a business’s stability. Financial instability prediction aims to develop a predictive model that incorporates multiple econometric parameters to forecast a firm’s financial situation in the future. This research documents our observations while exploring, constructing, and comparing some of the widely used classification models: extreme gradient boosting, decision trees, random forests, quadratic discriminant analysis, neural networks, adaptive boosting, gaussian naïve bayes, balanced bagging, and logistic regression, which apply to bankruptcy prediction. Our focus is on the bankruptcy dataset of Polish companies, where statistical features are curated collections created using synthetic features. We start by performing data preprocessing and exploratory analysis involving the imputation of missing values using popular imputation techniques such as mean, k-nearest neighbours (K-NN), expectation maximization (EM), and multivariate imputation by chained equations, also known as (MICE). To address the data imbalance issue, we oversample the minority class labels using the synthetic minority oversampling approach (SMOTE). Then the data modelling is done using k-fold cross validation on the aforementioned models and the imputed and resampled datasets. Finally, we get 36 different analyses for nine models on four different imputed datasets, then we assess and evaluate the models’ performance on the validation datasets and rank the models accordingly.\",\"PeriodicalId\":102958,\"journal\":{\"name\":\"2022 International Conference on Frontiers of Information Technology (FIT)\",\"volume\":\"126 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Frontiers of Information Technology (FIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FIT57066.2022.00029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Frontiers of Information Technology (FIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FIT57066.2022.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

破产预测是金融和会计科学的一个大领域。这个领域的重要性源于确定企业稳定性的风险。财务不稳定性预测旨在开发一个预测模型，该模型包含多个计量参数，以预测企业未来的财务状况。本研究记录了我们在探索、构建和比较一些广泛使用的分类模型时的观察结果:极端梯度增强、决策树、随机森林、二次判别分析、神经网络、自适应增强、高斯naïve贝叶斯、平衡bagging和逻辑回归，这些模型适用于破产预测。我们的重点是波兰公司的破产数据集，其中的统计特征是使用合成特征创建的精选集合。我们首先执行数据预处理和探索性分析，包括使用常用的imputation技术(如mean, k-nearest邻域(K-NN)，期望最大化(EM)和链式方程的多变量imputation，也称为(MICE))对缺失值进行imputation。为了解决数据不平衡问题，我们使用合成少数过采样方法(SMOTE)对少数类标签进行过采样。然后使用k-fold交叉验证对上述模型和输入和重新采样的数据集进行数据建模。最后，我们对9个模型在4个不同的输入数据集上进行了36种不同的分析，然后对模型在验证数据集上的性能进行了评估和评价，并对模型进行了相应的排名。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bankruptcy Prediction using Diverse Machine Learning Algorithms

Bankruptcy prediction is a large field of finance and accounting science. The importance of this field stems from determining the risk to a business’s stability. Financial instability prediction aims to develop a predictive model that incorporates multiple econometric parameters to forecast a firm’s financial situation in the future. This research documents our observations while exploring, constructing, and comparing some of the widely used classification models: extreme gradient boosting, decision trees, random forests, quadratic discriminant analysis, neural networks, adaptive boosting, gaussian naïve bayes, balanced bagging, and logistic regression, which apply to bankruptcy prediction. Our focus is on the bankruptcy dataset of Polish companies, where statistical features are curated collections created using synthetic features. We start by performing data preprocessing and exploratory analysis involving the imputation of missing values using popular imputation techniques such as mean, k-nearest neighbours (K-NN), expectation maximization (EM), and multivariate imputation by chained equations, also known as (MICE). To address the data imbalance issue, we oversample the minority class labels using the synthetic minority oversampling approach (SMOTE). Then the data modelling is done using k-fold cross validation on the aforementioned models and the imputed and resampled datasets. Finally, we get 36 different analyses for nine models on four different imputed datasets, then we assess and evaluate the models’ performance on the validation datasets and rank the models accordingly.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Frontiers of Information Technology (FIT)

自引率

0.00%

发文量