TransLSTD: Augmenting hierarchical disease risk prediction model with time and context awareness via disease clustering

IF 3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Systems Pub Date : 2024-04-19 DOI:10.1016/j.is.2024.102390

Tao You , Qiaodong Dang , Qing Li , Peng Zhang , Guanzhong Wu , Wei Huang

{"title":"TransLSTD: Augmenting hierarchical disease risk prediction model with time and context awareness via disease clustering","authors":"Tao You , Qiaodong Dang , Qing Li , Peng Zhang , Guanzhong Wu , Wei Huang","doi":"10.1016/j.is.2024.102390","DOIUrl":null,"url":null,"abstract":"<div><p>The use of electronic health records has become widespread, providing a valuable source of information for predicting disease risk. While deep neural network models have been proposed and shown to be effective in this task, supplemented with medical domain knowledge for interpretability, several limitations still exist. Firstly, there is often a lack of differentiation between chronic and acute diseases leading to biased modeling of diseases. Secondly, the extraction of patient single-layer temporal patterns is limited, which hinders comprehensive representation and predictive power. Thirdly, weak interpretability based on deep neural networks prevents the extraction of valuable medical knowledge, limiting practical applications. To overcome these challenges, we propose TransLSTD, a hierarchical model that incorporates time awareness and context awareness while distinguishing between long-term and short-term diseases. TransLSTD uses clustering algorithms to classify disease types based on the occurrence feature matrix of diseases from EHR dataset and updates disease representation at the code level while creating patient visit embeddings. The model utilizes query vectors to incorporate visit context information and combines time data to capture the patient’s overall health status. Finally, the prediction module generates outcomes and provides effective interpretations. We demonstrate the effectiveness of TransLSTD using two real-world datasets, outperforming state-of-the-art models in terms of both AUC and F1 values. The data and code are released at <span>https://github.com/DangQD/TransLSTD-master</span><svg><path></path></svg>.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"124 ","pages":"Article 102390"},"PeriodicalIF":3.0000,"publicationDate":"2024-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306437924000486","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The use of electronic health records has become widespread, providing a valuable source of information for predicting disease risk. While deep neural network models have been proposed and shown to be effective in this task, supplemented with medical domain knowledge for interpretability, several limitations still exist. Firstly, there is often a lack of differentiation between chronic and acute diseases leading to biased modeling of diseases. Secondly, the extraction of patient single-layer temporal patterns is limited, which hinders comprehensive representation and predictive power. Thirdly, weak interpretability based on deep neural networks prevents the extraction of valuable medical knowledge, limiting practical applications. To overcome these challenges, we propose TransLSTD, a hierarchical model that incorporates time awareness and context awareness while distinguishing between long-term and short-term diseases. TransLSTD uses clustering algorithms to classify disease types based on the occurrence feature matrix of diseases from EHR dataset and updates disease representation at the code level while creating patient visit embeddings. The model utilizes query vectors to incorporate visit context information and combines time data to capture the patient’s overall health status. Finally, the prediction module generates outcomes and provides effective interpretations. We demonstrate the effectiveness of TransLSTD using two real-world datasets, outperforming state-of-the-art models in terms of both AUC and F1 values. The data and code are released at https://github.com/DangQD/TransLSTD-master.

查看原文本刊更多论文

TransLSTD：通过疾病聚类增强分层疾病风险预测模型的时间和上下文意识

电子健康记录的使用已经非常普遍，为预测疾病风险提供了宝贵的信息来源。虽然深度神经网络模型已被提出并证明能有效地完成这项任务，并辅以医学领域知识以提高可解释性，但仍存在一些局限性。首先，慢性病和急性病之间往往缺乏区分，导致疾病建模存在偏差。其次，对患者单层时间模式的提取有限，这阻碍了综合表征和预测能力。第三，基于深度神经网络的可解释性较弱，无法提取有价值的医学知识，限制了实际应用。为了克服这些挑战，我们提出了 TransLSTD，这是一种分层模型，结合了时间感知和上下文感知，同时区分了长期和短期疾病。TransLSTD 采用聚类算法，根据电子病历数据集的疾病发生特征矩阵对疾病类型进行分类，并在创建患者就诊嵌入的同时更新代码级的疾病表示。该模型利用查询向量来整合就诊上下文信息，并结合时间数据来捕捉患者的整体健康状态。最后，预测模块生成结果并提供有效解释。我们使用两个真实数据集证明了 TransLSTD 的有效性，其 AUC 和 F1 值均优于最先进的模型。数据和代码发布于 https://github.com/DangQD/TransLSTD-master。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Systems 工程技术-计算机：信息系统

CiteScore

9.40

自引率

2.70%

发文量

112

审稿时长

53 days

期刊介绍： Information systems are the software and hardware systems that support data-intensive applications. The journal Information Systems publishes articles concerning the design and implementation of languages, data models, process models, algorithms, software and hardware for information systems. Subject areas include data management issues as presented in the principal international database conferences (e.g., ACM SIGMOD/PODS, VLDB, ICDE and ICDT/EDBT) as well as data-related issues from the fields of data mining/machine learning, information retrieval coordinated with structured data, internet and cloud data management, business process management, web semantics, visual and audio information systems, scientific computing, and data science. Implementation papers having to do with massively parallel data management, fault tolerance in practice, and special purpose hardware for data-intensive systems are also welcome. Manuscripts from application domains, such as urban informatics, social and natural science, and Internet of Things, are also welcome. All papers should highlight innovative solutions to data management problems such as new data models, performance enhancements, and show how those innovations contribute to the goals of the application.