MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation.

Yuan Zhong, Suhan Cui, Jiaqi Wang, Xiaochen Wang, Ziyi Yin, Yaqing Wang, Houping Xiao, Mengdi Huai, Ting Wang, Fenglong Ma
{"title":"MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation.","authors":"Yuan Zhong, Suhan Cui, Jiaqi Wang, Xiaochen Wang, Ziyi Yin, Yaqing Wang, Houping Xiao, Mengdi Huai, Ting Wang, Fenglong Ma","doi":"10.1137/1.9781611978032.58","DOIUrl":null,"url":null,"abstract":"<p><p>Health risk prediction aims to forecast the potential health risks that patients may face using their historical Electronic Health Records (EHR). Although several effective models have developed, data insufficiency is a key issue undermining their effectiveness. Various data generation and augmentation methods have been introduced to mitigate this issue by expanding the size of the training data set through learning underlying data distributions. However, the performance of these methods is often limited due to their task-unrelated design. To address these shortcomings, this paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. Furthermore, MedDiffusion discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data. Experimental evaluation on four real-world medical datasets demonstrates that MedDiffusion outperforms 14 cutting-edge baselines in terms of PR-AUC, F1, and Cohen's Kappa. We also conduct ablation studies and benchmark our model against GAN-based alternatives to further validate the rationality and adaptability of our model design. Additionally, we analyze generated data to offer fresh insights into the model's interpretability. The source code is available via https://shorturl.at/aerT0.</p>","PeriodicalId":74533,"journal":{"name":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","volume":"2024 ","pages":"499-507"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11469648/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... SIAM International Conference on Data Mining. SIAM International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/1.9781611978032.58","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Health risk prediction aims to forecast the potential health risks that patients may face using their historical Electronic Health Records (EHR). Although several effective models have developed, data insufficiency is a key issue undermining their effectiveness. Various data generation and augmentation methods have been introduced to mitigate this issue by expanding the size of the training data set through learning underlying data distributions. However, the performance of these methods is often limited due to their task-unrelated design. To address these shortcomings, this paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. Furthermore, MedDiffusion discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data. Experimental evaluation on four real-world medical datasets demonstrates that MedDiffusion outperforms 14 cutting-edge baselines in terms of PR-AUC, F1, and Cohen's Kappa. We also conduct ablation studies and benchmark our model against GAN-based alternatives to further validate the rationality and adaptability of our model design. Additionally, we analyze generated data to offer fresh insights into the model's interpretability. The source code is available via https://shorturl.at/aerT0.

MedDiffusion:通过基于扩散的数据扩增提升健康风险预测。
健康风险预测旨在利用患者的历史电子健康记录(EHR)预测患者可能面临的潜在健康风险。虽然已经开发出了一些有效的模型,但数据不足是影响其有效性的关键问题。为了缓解这一问题,人们引入了各种数据生成和增强方法,通过学习基础数据分布来扩大训练数据集的规模。然而,由于这些方法的设计与任务无关,其性能往往受到限制。为了解决这些缺陷,本文介绍了一种新颖的、基于端到端扩散的风险预测模型,命名为 MedDiffusion。它通过在训练过程中创建合成患者数据来扩大样本空间,从而提高风险预测性能。此外,MedDiffusion 还利用逐步关注机制来识别患者就诊之间的隐藏关系,使模型能够自动保留最重要的信息,从而生成高质量的数据。在四个真实世界医疗数据集上进行的实验评估表明,MedDiffusion 在 PR-AUC、F1 和 Cohen's Kappa 方面优于 14 个前沿基线。我们还进行了消融研究,并将我们的模型与基于 GAN 的替代模型进行了比较,从而进一步验证了我们模型设计的合理性和适应性。此外,我们还分析了生成的数据,为模型的可解释性提供了新的见解。源代码可通过 https://shorturl.at/aerT0 获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信