基于注意力的电子健康记录表格数据缺失值估算。

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics Pub Date : 2024-06-01 Epub Date: 2024-08-22 DOI:10.1109/ichi61247.2024.00030

Ibna Kowsar, Shourav B Rabbani, Manar D Samad

{"title":"基于注意力的电子健康记录表格数据缺失值估算。","authors":"Ibna Kowsar, Shourav B Rabbani, Manar D Samad","doi":"10.1109/ichi61247.2024.00030","DOIUrl":null,"url":null,"abstract":"The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":"2024 ","pages":"177-182"},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463999/pdf/","citationCount":"0","resultStr":"{\"title\":\"Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.\",\"authors\":\"Ibna Kowsar, Shourav B Rabbani, Manar D Samad\",\"doi\":\"10.1109/ichi61247.2024.00030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.\",\"PeriodicalId\":73284,\"journal\":{\"name\":\"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics\",\"volume\":\"2024 \",\"pages\":\"177-182\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463999/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ichi61247.2024.00030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ichi61247.2024.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

电子健康记录表格数据中缺失值的估算（IMV）对于机器学习进行特定患者预测建模至关重要。虽然生物统计学和最近的机器学习领域都开发了缺失值估算方法，但基于深度学习的解决方案在学习表格数据方面的成功率有限。本文提出了一种新颖的基于注意力的缺失值估算框架，它能利用特征间（自我注意力）或样本间注意力学习重建缺失值数据。我们采用了对比学习中使用的数据处理方法，以提高训练有素的估算模型的泛化能力。所提出的自我注意力估算方法优于最先进的统计和基于机器学习（决策树）的估算方法，在五个表格数据集上将归一化均方根误差降低了 18.4% 到 74.7%，在两个电子健康记录数据集上将归一化均方根误差降低了 52.6% 到 82.6%。当数值完全随机缺失时，所提出的基于注意力的缺失值估算方法在很大的缺失率范围（10% 到 50%）内都表现出了卓越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.

The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics

自引率

0.00%

发文量