{"title":"利用不同的机器学习算法进行破产预测","authors":"Ahmad Hassan, Nazish Yousaf","doi":"10.1109/FIT57066.2022.00029","DOIUrl":null,"url":null,"abstract":"Bankruptcy prediction is a large field of finance and accounting science. The importance of this field stems from determining the risk to a business’s stability. Financial instability prediction aims to develop a predictive model that incorporates multiple econometric parameters to forecast a firm’s financial situation in the future. This research documents our observations while exploring, constructing, and comparing some of the widely used classification models: extreme gradient boosting, decision trees, random forests, quadratic discriminant analysis, neural networks, adaptive boosting, gaussian naïve bayes, balanced bagging, and logistic regression, which apply to bankruptcy prediction. Our focus is on the bankruptcy dataset of Polish companies, where statistical features are curated collections created using synthetic features. We start by performing data preprocessing and exploratory analysis involving the imputation of missing values using popular imputation techniques such as mean, k-nearest neighbours (K-NN), expectation maximization (EM), and multivariate imputation by chained equations, also known as (MICE). To address the data imbalance issue, we oversample the minority class labels using the synthetic minority oversampling approach (SMOTE). Then the data modelling is done using k-fold cross validation on the aforementioned models and the imputed and resampled datasets. Finally, we get 36 different analyses for nine models on four different imputed datasets, then we assess and evaluate the models’ performance on the validation datasets and rank the models accordingly.","PeriodicalId":102958,"journal":{"name":"2022 International Conference on Frontiers of Information Technology (FIT)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bankruptcy Prediction using Diverse Machine Learning Algorithms\",\"authors\":\"Ahmad Hassan, Nazish Yousaf\",\"doi\":\"10.1109/FIT57066.2022.00029\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Bankruptcy prediction is a large field of finance and accounting science. The importance of this field stems from determining the risk to a business’s stability. Financial instability prediction aims to develop a predictive model that incorporates multiple econometric parameters to forecast a firm’s financial situation in the future. This research documents our observations while exploring, constructing, and comparing some of the widely used classification models: extreme gradient boosting, decision trees, random forests, quadratic discriminant analysis, neural networks, adaptive boosting, gaussian naïve bayes, balanced bagging, and logistic regression, which apply to bankruptcy prediction. Our focus is on the bankruptcy dataset of Polish companies, where statistical features are curated collections created using synthetic features. We start by performing data preprocessing and exploratory analysis involving the imputation of missing values using popular imputation techniques such as mean, k-nearest neighbours (K-NN), expectation maximization (EM), and multivariate imputation by chained equations, also known as (MICE). To address the data imbalance issue, we oversample the minority class labels using the synthetic minority oversampling approach (SMOTE). Then the data modelling is done using k-fold cross validation on the aforementioned models and the imputed and resampled datasets. Finally, we get 36 different analyses for nine models on four different imputed datasets, then we assess and evaluate the models’ performance on the validation datasets and rank the models accordingly.\",\"PeriodicalId\":102958,\"journal\":{\"name\":\"2022 International Conference on Frontiers of Information Technology (FIT)\",\"volume\":\"126 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Frontiers of Information Technology (FIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FIT57066.2022.00029\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Frontiers of Information Technology (FIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FIT57066.2022.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Bankruptcy Prediction using Diverse Machine Learning Algorithms
Bankruptcy prediction is a large field of finance and accounting science. The importance of this field stems from determining the risk to a business’s stability. Financial instability prediction aims to develop a predictive model that incorporates multiple econometric parameters to forecast a firm’s financial situation in the future. This research documents our observations while exploring, constructing, and comparing some of the widely used classification models: extreme gradient boosting, decision trees, random forests, quadratic discriminant analysis, neural networks, adaptive boosting, gaussian naïve bayes, balanced bagging, and logistic regression, which apply to bankruptcy prediction. Our focus is on the bankruptcy dataset of Polish companies, where statistical features are curated collections created using synthetic features. We start by performing data preprocessing and exploratory analysis involving the imputation of missing values using popular imputation techniques such as mean, k-nearest neighbours (K-NN), expectation maximization (EM), and multivariate imputation by chained equations, also known as (MICE). To address the data imbalance issue, we oversample the minority class labels using the synthetic minority oversampling approach (SMOTE). Then the data modelling is done using k-fold cross validation on the aforementioned models and the imputed and resampled datasets. Finally, we get 36 different analyses for nine models on four different imputed datasets, then we assess and evaluate the models’ performance on the validation datasets and rank the models accordingly.