{"title":"Machine learning based prediction tool of hospitalization cost","authors":"B. Abdelmoula, M. Khemakhem, N. Abdelmoula","doi":"10.1109/acit53391.2021.9677110","DOIUrl":null,"url":null,"abstract":"The increase in the cost of healthcare is a worldwide challenge. It has thus become essential to understand the nature and the weight of the factors that influence it and to foresee its future changes in order to ensure good governance, improve hospital management of material and financial resources and therefore be ready to face emergency situations such as the ongoing global pandemic. Using Python programming language, different supervised machine learning algorithms, were tested on a dataset extracted from digital medical records of hospitalized patients in the infectious diseases department at Sfax university hospital (Tunisia). Different models for predicting the hospitalization cost of a patient upon admission were created and evaluated after having processed and analyzed the collected data. This dataset initially comprised 542 observations and 136 variables including 36 quantitative ones and 100 dummy variables. Two variable selection methods were applied and subgroups of independent variables with different semantic meanings were also used. Despite few shortcomings such as missing data, the most precise of the different tested prediction models was that of 15th degree multiple linear regression. Regressors were the season of the period of hospitalization, suspected diagnosis and patient characteristics such as gender. When applied in reality, this tool would make it possible to predict the hospitalization cost and therefore forecast precise budgets. However, technical improvements remain to be made in order to optimize the quality of this tool and other algorithms could be tested to further broaden this study. The generalization of the implementation and use of well-developed digital medical records would allow the production of more complete databases from which better prediction models could be generated.","PeriodicalId":302120,"journal":{"name":"2021 22nd International Arab Conference on Information Technology (ACIT)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 22nd International Arab Conference on Information Technology (ACIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/acit53391.2021.9677110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The increase in the cost of healthcare is a worldwide challenge. It has thus become essential to understand the nature and the weight of the factors that influence it and to foresee its future changes in order to ensure good governance, improve hospital management of material and financial resources and therefore be ready to face emergency situations such as the ongoing global pandemic. Using Python programming language, different supervised machine learning algorithms, were tested on a dataset extracted from digital medical records of hospitalized patients in the infectious diseases department at Sfax university hospital (Tunisia). Different models for predicting the hospitalization cost of a patient upon admission were created and evaluated after having processed and analyzed the collected data. This dataset initially comprised 542 observations and 136 variables including 36 quantitative ones and 100 dummy variables. Two variable selection methods were applied and subgroups of independent variables with different semantic meanings were also used. Despite few shortcomings such as missing data, the most precise of the different tested prediction models was that of 15th degree multiple linear regression. Regressors were the season of the period of hospitalization, suspected diagnosis and patient characteristics such as gender. When applied in reality, this tool would make it possible to predict the hospitalization cost and therefore forecast precise budgets. However, technical improvements remain to be made in order to optimize the quality of this tool and other algorithms could be tested to further broaden this study. The generalization of the implementation and use of well-developed digital medical records would allow the production of more complete databases from which better prediction models could be generated.