{"title":"IGAMT: Privacy-Preserving Electronic Health Record Synthesization with Heterogeneity and Irregularity.","authors":"Wenjie Wang, Pengfei Tang, Jian Lou, Yuanming Shao, Lance Waller, Yi-An Ko, Li Xiong","doi":"10.1609/aaai.v38i14.29491","DOIUrl":null,"url":null,"abstract":"<p><p>Utilizing electronic health records (EHR) for machine learning-driven clinical research has great potential to enhance outcome predictions and treatment personalization. Nonetheless, due to privacy and security concerns, the secondary use of EHR data is regulated, constraining researchers' access to EHR data. Generating synthetic EHR data with deep learning methods is a viable and promising approach to mitigate privacy concerns, offering not only a supplementary resource for downstream applications but also sidestepping the privacy risks associated with real patient data. While prior efforts have concentrated on EHR data synthesis, significant challenges persist: addressing the heterogeneity of features including temporal and non-temporal features, structurally missing values, and irregularity of the temporal measures, and ensuring rigorous privacy of the real data used for model training. Existing works in this domain only focused on solving one or two aforementioned challenges. In this work, we propose <i>IGAMT</i>, an innovative framework to generate privacy-preserved synthetic EHR data that not only maintains high quality with heterogeneous features, missing values, and irregular measures but also achieves differential privacy with enhanced privacy-utility trade-off. Extensive experiments prove that <i>IGAMT</i> significantly outperforms baseline and state-of-the-art models in terms of resemblance to real data and performance of downstream applications. Ablation studies also prove the effectiveness of the techniques applied in <i>IGAMT</i>.</p>","PeriodicalId":74506,"journal":{"name":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","volume":"38 14","pages":"15634-15643"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11606572/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/aaai.v38i14.29491","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/24 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Utilizing electronic health records (EHR) for machine learning-driven clinical research has great potential to enhance outcome predictions and treatment personalization. Nonetheless, due to privacy and security concerns, the secondary use of EHR data is regulated, constraining researchers' access to EHR data. Generating synthetic EHR data with deep learning methods is a viable and promising approach to mitigate privacy concerns, offering not only a supplementary resource for downstream applications but also sidestepping the privacy risks associated with real patient data. While prior efforts have concentrated on EHR data synthesis, significant challenges persist: addressing the heterogeneity of features including temporal and non-temporal features, structurally missing values, and irregularity of the temporal measures, and ensuring rigorous privacy of the real data used for model training. Existing works in this domain only focused on solving one or two aforementioned challenges. In this work, we propose IGAMT, an innovative framework to generate privacy-preserved synthetic EHR data that not only maintains high quality with heterogeneous features, missing values, and irregular measures but also achieves differential privacy with enhanced privacy-utility trade-off. Extensive experiments prove that IGAMT significantly outperforms baseline and state-of-the-art models in terms of resemblance to real data and performance of downstream applications. Ablation studies also prove the effectiveness of the techniques applied in IGAMT.