{"title":"基于注意力的电子健康记录表格数据缺失值估算。","authors":"Ibna Kowsar, Shourav B Rabbani, Manar D Samad","doi":"10.1109/ichi61247.2024.00030","DOIUrl":null,"url":null,"abstract":"<p><p>The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.</p>","PeriodicalId":73284,"journal":{"name":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463999/pdf/","citationCount":"0","resultStr":"{\"title\":\"Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.\",\"authors\":\"Ibna Kowsar, Shourav B Rabbani, Manar D Samad\",\"doi\":\"10.1109/ichi61247.2024.00030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.</p>\",\"PeriodicalId\":73284,\"journal\":{\"name\":\"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11463999/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ichi61247.2024.00030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Healthcare Informatics. IEEE International Conference on Healthcare Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ichi61247.2024.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/22 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data.
The imputation of missing values (IMV) in electronic health records tabular data is crucial to enable machine learning for patient-specific predictive modeling. While IMV methods are developed in biostatistics and recently in machine learning, deep learning-based solutions have shown limited success in learning tabular data. This paper proposes a novel attention-based missing value imputation framework that learns to reconstruct data with missing values leveraging between-feature (self-attention) or between-sample attentions. We adopt data manipulation methods used in contrastive learning to improve the generalization of the trained imputation model. The proposed self-attention imputation method outperforms state-of-the-art statistical and machine learning-based (decision-tree) imputation methods, reducing the normalized root mean squared error by 18.4% to 74.7% on five tabular data sets and 52.6% to 82.6% on two electronic health records data sets. The proposed attention-based missing value imputation method shows superior performance across a wide range of missingness (10% to 50%) when the values are missing completely at random.