TASTE:用于电子健康记录表型的时态和静态张量因子化。

Ardavan Afshar, Ioakeim Perros, Haesun Park, Christopher deFilippi, Xiaowei Yan, Walter Stewart, Joyce Ho, Jimeng Sun
{"title":"TASTE:用于电子健康记录表型的时态和静态张量因子化。","authors":"Ardavan Afshar, Ioakeim Perros, Haesun Park, Christopher deFilippi, Xiaowei Yan, Walter Stewart, Joyce Ho, Jimeng Sun","doi":"10.1145/3368555.3384464","DOIUrl":null,"url":null,"abstract":"<p><p><i>Phenotyping electronic health records (EHR)</i> focuses on defining meaningful patient groups (e.g., heart failure group and diabetes group) and identifying the temporal evolution of patients in those groups. Tensor factorization has been an effective tool for phenotyping. Most of the existing works assume either a static patient representation with aggregate data or only model temporal data. However, real EHR data contain both temporal (e.g., longitudinal clinical visits) and static information (e.g., patient demographics), which are difficult to model simultaneously. In this paper, we propose <i>T</i>emporal <i>A</i>nd <i>S</i>tatic <i>TE</i>nsor factorization (TASTE) that jointly models both static and temporal information to extract phenotypes. TASTE combines the PARAFAC2 model with non-negative matrix factorization to model a temporal and a static tensor. To fit the proposed model, we transform the original problem into simpler ones which are optimally solved in an alternating fashion. For each of the sub-problems, our proposed mathematical re-formulations lead to efficient sub-problem solvers. Comprehensive experiments on large EHR data from a heart failure (HF) study confirmed that TASTE is up to 14× faster than several baselines and the resulting phenotypes were confirmed to be clinically meaningful by a cardiologist. Using 60 phenotypes extracted by TASTE, a simple logistic regression can achieve the same level of area under the curve (AUC) for HF prediction compared to a deep learning model using recurrent neural networks (RNN) with 345 features.</p>","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924914/pdf/nihms-1587674.pdf","citationCount":"0","resultStr":"{\"title\":\"TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records.\",\"authors\":\"Ardavan Afshar, Ioakeim Perros, Haesun Park, Christopher deFilippi, Xiaowei Yan, Walter Stewart, Joyce Ho, Jimeng Sun\",\"doi\":\"10.1145/3368555.3384464\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p><i>Phenotyping electronic health records (EHR)</i> focuses on defining meaningful patient groups (e.g., heart failure group and diabetes group) and identifying the temporal evolution of patients in those groups. Tensor factorization has been an effective tool for phenotyping. Most of the existing works assume either a static patient representation with aggregate data or only model temporal data. However, real EHR data contain both temporal (e.g., longitudinal clinical visits) and static information (e.g., patient demographics), which are difficult to model simultaneously. In this paper, we propose <i>T</i>emporal <i>A</i>nd <i>S</i>tatic <i>TE</i>nsor factorization (TASTE) that jointly models both static and temporal information to extract phenotypes. TASTE combines the PARAFAC2 model with non-negative matrix factorization to model a temporal and a static tensor. To fit the proposed model, we transform the original problem into simpler ones which are optimally solved in an alternating fashion. For each of the sub-problems, our proposed mathematical re-formulations lead to efficient sub-problem solvers. Comprehensive experiments on large EHR data from a heart failure (HF) study confirmed that TASTE is up to 14× faster than several baselines and the resulting phenotypes were confirmed to be clinically meaningful by a cardiologist. Using 60 phenotypes extracted by TASTE, a simple logistic regression can achieve the same level of area under the curve (AUC) for HF prediction compared to a deep learning model using recurrent neural networks (RNN) with 345 features.</p>\",\"PeriodicalId\":87342,\"journal\":{\"name\":\"Proceedings of the ACM Conference on Health, Inference, and Learning\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924914/pdf/nihms-1587674.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Conference on Health, Inference, and Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3368555.3384464\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Conference on Health, Inference, and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3368555.3384464","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

电子健康记录(EHR)表型分析的重点是定义有意义的患者组别(如心衰组和糖尿病组),并识别这些组别中患者的时间演变。张量因子化是表型分析的有效工具。现有的大多数研究都假定病人代表是一个静态的集合数据,或者只对时间数据进行建模。然而,真实的电子病历数据既包含时间信息(如纵向临床访问),也包含静态信息(如患者人口统计数据),这两种信息很难同时建模。在本文中,我们提出了时态和静态因式分解法(Temporal And Static TEnsor factorization,TASTE),它能对静态和时态信息进行联合建模,从而提取表型。TASTE 将 PARAFAC2 模型与非负矩阵因式分解相结合,为时间和静态张量建模。为了适应所提出的模型,我们将原始问题转化为更简单的问题,并以交替方式优化解决。对于每一个子问题,我们提出的数学重新表述都能带来高效的子问题求解器。在一项心力衰竭(HF)研究的大量电子病历数据上进行的综合实验证实,TASTE 比几种基线方法快 14 倍,而且得出的表型经心脏病专家确认具有临床意义。使用 TASTE 提取的 60 个表型,与使用具有 345 个特征的递归神经网络 (RNN) 的深度学习模型相比,简单的逻辑回归能达到相同的高频预测曲线下面积 (AUC)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records.

Phenotyping electronic health records (EHR) focuses on defining meaningful patient groups (e.g., heart failure group and diabetes group) and identifying the temporal evolution of patients in those groups. Tensor factorization has been an effective tool for phenotyping. Most of the existing works assume either a static patient representation with aggregate data or only model temporal data. However, real EHR data contain both temporal (e.g., longitudinal clinical visits) and static information (e.g., patient demographics), which are difficult to model simultaneously. In this paper, we propose Temporal And Static TEnsor factorization (TASTE) that jointly models both static and temporal information to extract phenotypes. TASTE combines the PARAFAC2 model with non-negative matrix factorization to model a temporal and a static tensor. To fit the proposed model, we transform the original problem into simpler ones which are optimally solved in an alternating fashion. For each of the sub-problems, our proposed mathematical re-formulations lead to efficient sub-problem solvers. Comprehensive experiments on large EHR data from a heart failure (HF) study confirmed that TASTE is up to 14× faster than several baselines and the resulting phenotypes were confirmed to be clinically meaningful by a cardiologist. Using 60 phenotypes extracted by TASTE, a simple logistic regression can achieve the same level of area under the curve (AUC) for HF prediction compared to a deep learning model using recurrent neural networks (RNN) with 345 features.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信