K. Shakhgeldyan, B. Geltser, V. Rublev, Basil Shirobokov, Dan Geltser, A. Kriger
{"title":"非平衡样本冠状动脉搭桥术后院内死亡率预测的特征选择策略","authors":"K. Shakhgeldyan, B. Geltser, V. Rublev, Basil Shirobokov, Dan Geltser, A. Kriger","doi":"10.1145/3424978.3425090","DOIUrl":null,"url":null,"abstract":"The aim of the study is to develop models of intrahospital mortality (IHM) prediction on an unbalanced sample of patients with coronary artery disease (CAD) post coronary artery bypass graft (CABG) surgery. Methods. Models for IHM prediction were built following the analysis of 866 electronic case histories based on the analysis of CAD patients, revascularized with the CABG operation. The patient cohort consisted of two groups. The first included 35 (4%) patients who died within the first 30 days after CABG, the second - 831 (96%) patients with a favorable operation outcome. We analyzed 99 factors, including the results of clinical, laboratory and instrumental studies obtained before CABG. For feature compilation, classical filtering and model selection methods were used (wrapper method). The primary drawback to applying a classical approach was the unbalanced sample as one cohort only consisted of 4% of subjects. In that case, it was not possible to apply the cross-validation procedure with three types of samples, standard quality metrics and multi-category factors. Results. Features searching approach using the multi-stage selection procedure, which combined the validation of predefined predictors, filtering methods and multifactor model development based on logistic regression, random forest (RF) and artificial neural networks (ANNs) was proposed. The models' accuracy was evaluated by a combined quality metric. RF and ANNs based models allowed not only to build more accurate forecasting tools, but also assisted in verifying five additional IHM predictors.","PeriodicalId":178822,"journal":{"name":"Proceedings of the 4th International Conference on Computer Science and Application Engineering","volume":"200 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Feature Selection Strategy for Intrahospital Mortality Prediction after Coronary Artery Bypass Graft Surgery on an Unbalanced Sample\",\"authors\":\"K. Shakhgeldyan, B. Geltser, V. Rublev, Basil Shirobokov, Dan Geltser, A. Kriger\",\"doi\":\"10.1145/3424978.3425090\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The aim of the study is to develop models of intrahospital mortality (IHM) prediction on an unbalanced sample of patients with coronary artery disease (CAD) post coronary artery bypass graft (CABG) surgery. Methods. Models for IHM prediction were built following the analysis of 866 electronic case histories based on the analysis of CAD patients, revascularized with the CABG operation. The patient cohort consisted of two groups. The first included 35 (4%) patients who died within the first 30 days after CABG, the second - 831 (96%) patients with a favorable operation outcome. We analyzed 99 factors, including the results of clinical, laboratory and instrumental studies obtained before CABG. For feature compilation, classical filtering and model selection methods were used (wrapper method). The primary drawback to applying a classical approach was the unbalanced sample as one cohort only consisted of 4% of subjects. In that case, it was not possible to apply the cross-validation procedure with three types of samples, standard quality metrics and multi-category factors. Results. Features searching approach using the multi-stage selection procedure, which combined the validation of predefined predictors, filtering methods and multifactor model development based on logistic regression, random forest (RF) and artificial neural networks (ANNs) was proposed. The models' accuracy was evaluated by a combined quality metric. RF and ANNs based models allowed not only to build more accurate forecasting tools, but also assisted in verifying five additional IHM predictors.\",\"PeriodicalId\":178822,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Computer Science and Application Engineering\",\"volume\":\"200 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Computer Science and Application Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3424978.3425090\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Computer Science and Application Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3424978.3425090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature Selection Strategy for Intrahospital Mortality Prediction after Coronary Artery Bypass Graft Surgery on an Unbalanced Sample
The aim of the study is to develop models of intrahospital mortality (IHM) prediction on an unbalanced sample of patients with coronary artery disease (CAD) post coronary artery bypass graft (CABG) surgery. Methods. Models for IHM prediction were built following the analysis of 866 electronic case histories based on the analysis of CAD patients, revascularized with the CABG operation. The patient cohort consisted of two groups. The first included 35 (4%) patients who died within the first 30 days after CABG, the second - 831 (96%) patients with a favorable operation outcome. We analyzed 99 factors, including the results of clinical, laboratory and instrumental studies obtained before CABG. For feature compilation, classical filtering and model selection methods were used (wrapper method). The primary drawback to applying a classical approach was the unbalanced sample as one cohort only consisted of 4% of subjects. In that case, it was not possible to apply the cross-validation procedure with three types of samples, standard quality metrics and multi-category factors. Results. Features searching approach using the multi-stage selection procedure, which combined the validation of predefined predictors, filtering methods and multifactor model development based on logistic regression, random forest (RF) and artificial neural networks (ANNs) was proposed. The models' accuracy was evaluated by a combined quality metric. RF and ANNs based models allowed not only to build more accurate forecasting tools, but also assisted in verifying five additional IHM predictors.