Using cross-domain knowledge augmentation to explore comorbidity in electronic health records data

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Expert Systems with Applications Pub Date : 2024-11-05 DOI:10.1016/j.eswa.2024.125644

Kaiyuan Zhang , Buyue Qian , Xiyuan Zhang , Qinghua Zheng

{"title":"Using cross-domain knowledge augmentation to explore comorbidity in electronic health records data","authors":"Kaiyuan Zhang , Buyue Qian , Xiyuan Zhang , Qinghua Zheng","doi":"10.1016/j.eswa.2024.125644","DOIUrl":null,"url":null,"abstract":"<div><div>Research concentrating on specific diseases or employing single datasets, such as medical histories and thematic data, has garnered considerable attention. However, there has been limited investigation into comorbidities. Although some ad-hoc methods have been utilized in the medical field, there is a scarcity of systematic approaches to address this challenge. The task of expressing patient features using heterogeneous and cross-domain data presents considerable difficulties. Directly mapping this data into a matrix frequently results in issues such as high dimensionality, sparsity, redundancy, and noise. Additionally, given the critical role of supervisory information in medicine, acquiring accurate information is paramount. To address these issues, we propose an enhanced clustering method that capitalizes on cross-domain knowledge augmentation. This method can iteratively learn clustering outcomes and cross-domain knowledge. The cross-domain knowledge matrix produced by our approach can be interpreted as a measure of similarity between instances across domains. We validate our proposed model in real-world electronic health records (EHR) data, and achieve significant performance improvement compared to the baseline method, successfully completing the task of exploring comorbidity. Due to the privacy of EHR data, we also conduct extensive experiments on the publicly available datasets DBLP and UCI. The experimental results show that our algorithm is superior to the baseline algorithm and has strong generality.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"263 ","pages":"Article 125644"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424025119","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Research concentrating on specific diseases or employing single datasets, such as medical histories and thematic data, has garnered considerable attention. However, there has been limited investigation into comorbidities. Although some ad-hoc methods have been utilized in the medical field, there is a scarcity of systematic approaches to address this challenge. The task of expressing patient features using heterogeneous and cross-domain data presents considerable difficulties. Directly mapping this data into a matrix frequently results in issues such as high dimensionality, sparsity, redundancy, and noise. Additionally, given the critical role of supervisory information in medicine, acquiring accurate information is paramount. To address these issues, we propose an enhanced clustering method that capitalizes on cross-domain knowledge augmentation. This method can iteratively learn clustering outcomes and cross-domain knowledge. The cross-domain knowledge matrix produced by our approach can be interpreted as a measure of similarity between instances across domains. We validate our proposed model in real-world electronic health records (EHR) data, and achieve significant performance improvement compared to the baseline method, successfully completing the task of exploring comorbidity. Due to the privacy of EHR data, we also conduct extensive experiments on the publicly available datasets DBLP and UCI. The experimental results show that our algorithm is superior to the baseline algorithm and has strong generality.

查看原文本刊更多论文

利用跨域知识扩增探索电子健康记录数据中的合并症

专注于特定疾病或采用单一数据集（如病史和专题数据）的研究已引起了广泛关注。然而，对合并症的研究却十分有限。虽然在医疗领域已经使用了一些临时方法，但还缺乏系统的方法来应对这一挑战。使用异构和跨域数据来表达患者特征的任务相当困难。将这些数据直接映射到矩阵中经常会导致高维、稀疏、冗余和噪声等问题。此外，鉴于监督信息在医学中的关键作用，获取准确的信息至关重要。为了解决这些问题，我们提出了一种利用跨领域知识增强的增强聚类方法。这种方法可以反复学习聚类结果和跨领域知识。我们的方法产生的跨领域知识矩阵可以解释为跨领域实例之间相似性的度量。我们在真实世界的电子健康记录（EHR）数据中验证了我们提出的模型，与基线方法相比，性能有了显著提高，成功完成了探索合并症的任务。由于电子病历数据的隐私性，我们还在公开数据集 DBLP 和 UCI 上进行了大量实验。实验结果表明，我们的算法优于基线算法，并且具有很强的通用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.