利用国家再入院数据库进行医院再入院预测的不平衡学习

2020 IEEE International Conference on Knowledge Graph (ICKG) Pub Date : 2020-08-01 DOI:10.1109/ICBK50248.2020.00026

Shuwen Wang, Magdalyn E. Elkin, Xingquan Zhu

{"title":"利用国家再入院数据库进行医院再入院预测的不平衡学习","authors":"Shuwen Wang, Magdalyn E. Elkin, Xingquan Zhu","doi":"10.1109/ICBK50248.2020.00026","DOIUrl":null,"url":null,"abstract":"In this paper, we propose to use imbalanced learning for hospital readmission prediction. The goal is to predict whether a patient, based on his/her current hospital visit records, is likely going to be re-admitted or not within 30-days after being discharged from the current hospital visit. The main challenge of hospital readmission prediction is twofold: (1) the readmission visits (i.e., the positive class) are a small portion of the total hospital visits, representing a severe class imbalance problem for learning; (2) due to privacy and health regulation, the information available for patient characterization is limited; and is often only limited to the payment level information. However, there are over 80,000 procedures code, representing a high dimensionality and high sparsity problem for learning. Motivated by the above challenges, in this paper, we design an imbalanced learning strategy to create features from patient hospital visit, by combining patient demographic information, ICD-10 clinical modification (CM) and procedure codes (PCS), and Clinical Classification Software Refined (CCSR) conversion. Instead of directly using ICD-10-CM/PCS code to characterize patients, we convert each patient’s visit to CCSR code space with a smaller feature space. By using random sampling approach to balance the sample distributions in the training set, our method achieves good performance to predict patient readmission.","PeriodicalId":432857,"journal":{"name":"2020 IEEE International Conference on Knowledge Graph (ICKG)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Imbalanced Learning for Hospital Readmission Prediction using National Readmission Database\",\"authors\":\"Shuwen Wang, Magdalyn E. Elkin, Xingquan Zhu\",\"doi\":\"10.1109/ICBK50248.2020.00026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose to use imbalanced learning for hospital readmission prediction. The goal is to predict whether a patient, based on his/her current hospital visit records, is likely going to be re-admitted or not within 30-days after being discharged from the current hospital visit. The main challenge of hospital readmission prediction is twofold: (1) the readmission visits (i.e., the positive class) are a small portion of the total hospital visits, representing a severe class imbalance problem for learning; (2) due to privacy and health regulation, the information available for patient characterization is limited; and is often only limited to the payment level information. However, there are over 80,000 procedures code, representing a high dimensionality and high sparsity problem for learning. Motivated by the above challenges, in this paper, we design an imbalanced learning strategy to create features from patient hospital visit, by combining patient demographic information, ICD-10 clinical modification (CM) and procedure codes (PCS), and Clinical Classification Software Refined (CCSR) conversion. Instead of directly using ICD-10-CM/PCS code to characterize patients, we convert each patient’s visit to CCSR code space with a smaller feature space. By using random sampling approach to balance the sample distributions in the training set, our method achieves good performance to predict patient readmission.\",\"PeriodicalId\":432857,\"journal\":{\"name\":\"2020 IEEE International Conference on Knowledge Graph (ICKG)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Knowledge Graph (ICKG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBK50248.2020.00026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Knowledge Graph (ICKG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK50248.2020.00026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在本文中，我们提出将不平衡学习用于再入院预测。目标是根据患者当前的医院就诊记录，预测其出院后30天内是否有可能再次入院。医院再入院预测的主要挑战有两个方面:(1)再入院次数(即积极班级)占医院总访问量的一小部分，代表了严重的学习班级不平衡问题;(2)由于隐私和卫生法规的限制，可用于患者特征描述的信息有限;而且通常只局限于支付级别的信息。然而，有超过80,000个过程代码，代表了一个高维和高稀疏性的学习问题。在上述挑战的激励下，本文设计了一种不平衡学习策略，通过结合患者人口统计信息、ICD-10临床修改(CM)和程序代码(PCS)以及临床分类软件改进(CCSR)转换，从患者就诊中创建特征。我们不是直接使用ICD-10-CM/PCS代码来表征患者，而是将每个患者的就诊转换为具有较小特征空间的CCSR代码空间。通过使用随机抽样方法平衡训练集中的样本分布，我们的方法在预测患者再入院方面取得了很好的效果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Imbalanced Learning for Hospital Readmission Prediction using National Readmission Database

In this paper, we propose to use imbalanced learning for hospital readmission prediction. The goal is to predict whether a patient, based on his/her current hospital visit records, is likely going to be re-admitted or not within 30-days after being discharged from the current hospital visit. The main challenge of hospital readmission prediction is twofold: (1) the readmission visits (i.e., the positive class) are a small portion of the total hospital visits, representing a severe class imbalance problem for learning; (2) due to privacy and health regulation, the information available for patient characterization is limited; and is often only limited to the payment level information. However, there are over 80,000 procedures code, representing a high dimensionality and high sparsity problem for learning. Motivated by the above challenges, in this paper, we design an imbalanced learning strategy to create features from patient hospital visit, by combining patient demographic information, ICD-10 clinical modification (CM) and procedure codes (PCS), and Clinical Classification Software Refined (CCSR) conversion. Instead of directly using ICD-10-CM/PCS code to characterize patients, we convert each patient’s visit to CCSR code space with a smaller feature space. By using random sampling approach to balance the sample distributions in the training set, our method achieves good performance to predict patient readmission.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Conference on Knowledge Graph (ICKG)

自引率

0.00%

发文量