Learning and diSentangling patient static information from time-series Electronic hEalth Records (STEER).

PLOS digital health Pub Date : 2024-10-21 eCollection Date: 2024-10-01 DOI:10.1371/journal.pdig.0000640
Wei Liao, Joel Voldman
{"title":"Learning and diSentangling patient static information from time-series Electronic hEalth Records (STEER).","authors":"Wei Liao, Joel Voldman","doi":"10.1371/journal.pdig.0000640","DOIUrl":null,"url":null,"abstract":"<p><p>Recent work in machine learning for healthcare has raised concerns about patient privacy and algorithmic fairness. Previous work has shown that self-reported race can be predicted from medical data that does not explicitly contain racial information. However, the extent of data identification is unknown, and we lack ways to develop models whose outcomes are minimally affected by such information. Here we systematically investigated the ability of time-series electronic health record data to predict patient static information. We found that not only the raw time-series data, but also learned representations from machine learning models, can be trained to predict a variety of static information with area under the receiver operating characteristic curve as high as 0.851 for biological sex, 0.869 for binarized age and 0.810 for self-reported race. Such high predictive performance can be extended to various comorbidity factors and exists even when the model was trained for different tasks, using different cohorts, using different model architectures and databases. Given the privacy and fairness concerns these findings pose, we develop a variational autoencoder-based approach that learns a structured latent space to disentangle patient-sensitive attributes from time-series data. Our work thoroughly investigates the ability of machine learning models to encode patient static information from time-series electronic health records and introduces a general approach to protect patient-sensitive information for downstream tasks.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 10","pages":"e0000640"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493250/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000640","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recent work in machine learning for healthcare has raised concerns about patient privacy and algorithmic fairness. Previous work has shown that self-reported race can be predicted from medical data that does not explicitly contain racial information. However, the extent of data identification is unknown, and we lack ways to develop models whose outcomes are minimally affected by such information. Here we systematically investigated the ability of time-series electronic health record data to predict patient static information. We found that not only the raw time-series data, but also learned representations from machine learning models, can be trained to predict a variety of static information with area under the receiver operating characteristic curve as high as 0.851 for biological sex, 0.869 for binarized age and 0.810 for self-reported race. Such high predictive performance can be extended to various comorbidity factors and exists even when the model was trained for different tasks, using different cohorts, using different model architectures and databases. Given the privacy and fairness concerns these findings pose, we develop a variational autoencoder-based approach that learns a structured latent space to disentangle patient-sensitive attributes from time-series data. Our work thoroughly investigates the ability of machine learning models to encode patient static information from time-series electronic health records and introduces a general approach to protect patient-sensitive information for downstream tasks.

从时间序列电子健康记录(STEER)中学习和识别患者静态信息。
最近在医疗保健领域开展的机器学习工作引起了人们对患者隐私和算法公平性的关注。之前的研究表明,自我报告的种族可以从不具种族信息的医疗数据中预测出来。然而,数据识别的程度尚不可知,我们也没有办法开发出其结果受此类信息影响最小的模型。在此,我们系统地研究了时间序列电子健康记录数据预测患者静态信息的能力。我们发现,不仅原始的时间序列数据,而且从机器学习模型中学习到的表征,都可以通过训练来预测各种静态信息,其接收者操作特征曲线下面积对生物性别的预测高达 0.851,对二进制年龄的预测高达 0.869,对自我报告的种族的预测高达 0.810。如此高的预测性能可以扩展到各种合并症因素,即使模型是针对不同的任务、使用不同的队列、使用不同的模型架构和数据库进行训练时也是如此。考虑到这些发现对隐私和公平性的影响,我们开发了一种基于变异自动编码器的方法,该方法可学习结构化潜空间,从时间序列数据中分离出患者敏感属性。我们的工作深入研究了机器学习模型从时间序列电子健康记录中编码患者静态信息的能力,并为下游任务引入了一种保护患者敏感信息的通用方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信