{"title":"Weak-Supervision for Prolonged Hospital Length of Stay Prediction","authors":"Ariana J. Mann, N. Bambos","doi":"10.1109/HealthCom54947.2022.9982748","DOIUrl":null,"url":null,"abstract":"Predicting whether a patient will have a prolonged length of stay (LoS) once admitted to a hospital can help ensure medical resources are allocated to where they are needed most. However, prior works on classifying prolonged-LoS patients define a prolonged-LoS as being greater than a single, flat number-of-days cutoff. Using a flat cutoff, means that the classification occurs without reference to a baseline LoS, fails to control for any covariates, and is generally only effective for a specific medical subgroup. Instead, in this work, we introduce an approach where the algorithm designer specifies a LoS percentile that should be used as the cutoff for prolonged-LoS. In a method known as weak-supervision, we use the LoS percentile cutoff to train a model to produce the actual labels for classification machine learning training. Contrary to a number-of-days cutoff, the LoS percentile cutoff coupled with weak-supervision, provides what we claim is a more principled and flexible approach to defining what constitutes a prolonged-LoS.Specifically, we train a quantile regression model to predict the designated LoS percentile value for each patient, which importantly allows us to control for covariates that access to medical care should be equalized across (such as primary medical condition, hospital facility, and admission time of day). The regression output is cast as a noisy binary label for prolonged-LoS, which is then used to train a machine learning model for prolonged-LoS classification. We empirically demonstrate that this weak-supervision based approach provides usable classification performance despite using noisy labels.","PeriodicalId":202664,"journal":{"name":"2022 IEEE International Conference on E-health Networking, Application & Services (HealthCom)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on E-health Networking, Application & Services (HealthCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HealthCom54947.2022.9982748","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Predicting whether a patient will have a prolonged length of stay (LoS) once admitted to a hospital can help ensure medical resources are allocated to where they are needed most. However, prior works on classifying prolonged-LoS patients define a prolonged-LoS as being greater than a single, flat number-of-days cutoff. Using a flat cutoff, means that the classification occurs without reference to a baseline LoS, fails to control for any covariates, and is generally only effective for a specific medical subgroup. Instead, in this work, we introduce an approach where the algorithm designer specifies a LoS percentile that should be used as the cutoff for prolonged-LoS. In a method known as weak-supervision, we use the LoS percentile cutoff to train a model to produce the actual labels for classification machine learning training. Contrary to a number-of-days cutoff, the LoS percentile cutoff coupled with weak-supervision, provides what we claim is a more principled and flexible approach to defining what constitutes a prolonged-LoS.Specifically, we train a quantile regression model to predict the designated LoS percentile value for each patient, which importantly allows us to control for covariates that access to medical care should be equalized across (such as primary medical condition, hospital facility, and admission time of day). The regression output is cast as a noisy binary label for prolonged-LoS, which is then used to train a machine learning model for prolonged-LoS classification. We empirically demonstrate that this weak-supervision based approach provides usable classification performance despite using noisy labels.