Knowledge Graph Building from Real-world Multisource “Dirty” Clinical Electronic Medical Records for Intelligent Consultation Applications

Xinlong Liu, Li-Qun Xu
{"title":"Knowledge Graph Building from Real-world Multisource “Dirty” Clinical Electronic Medical Records for Intelligent Consultation Applications","authors":"Xinlong Liu, Li-Qun Xu","doi":"10.1109/icdh52753.2021.00049","DOIUrl":null,"url":null,"abstract":"Intelligent clinical consultation is a diagnostic support system that inferred the likely diseases from the patient's chief complaints as per the established relationship between symptoms and diseases. The key here is to learn and build automatically the general “symptom-disease” medical knowledge graph (MKG) from real-world clinical data. So, the quality of clinical data (chiefly electronic medical records - EMRs) directly affects the quality of the MKG, which in turn determines the quality of the consultation results. The regional public health information platform gathered a large number of front-pages of EMRs' from hospitals of all tiers across the region. The fact that the health IT systems used by hospitals are often sourced from different vendors, and each may have its own data standards and data quality control criteria, would invariably lead to apparent difference in the quality of EMRs collected. This is even so, considering the gaps in knowledge and skills between clinicians at different qualification levels. By detailed analysis of one such collection we found that the two most prominent problems are the inconsistency in diagnosis results and the mismatch between the diagnosis results and the chief complaints and the current illness history. In order to ensure the quality and effectiveness in building a knowledge graph from these real-world data, this paper proposed a “dirty” data cleaning framework including diagnostic results normalization and semantic similarity matching. The symptom-disease knowledge graph constructed from the cleaned data has been applied and verified in the intelligent consultation system.","PeriodicalId":93401,"journal":{"name":"2021 IEEE International Conference on Digital Health (ICDH)","volume":"9 1","pages":"260-265"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Digital Health (ICDH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icdh52753.2021.00049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Intelligent clinical consultation is a diagnostic support system that inferred the likely diseases from the patient's chief complaints as per the established relationship between symptoms and diseases. The key here is to learn and build automatically the general “symptom-disease” medical knowledge graph (MKG) from real-world clinical data. So, the quality of clinical data (chiefly electronic medical records - EMRs) directly affects the quality of the MKG, which in turn determines the quality of the consultation results. The regional public health information platform gathered a large number of front-pages of EMRs' from hospitals of all tiers across the region. The fact that the health IT systems used by hospitals are often sourced from different vendors, and each may have its own data standards and data quality control criteria, would invariably lead to apparent difference in the quality of EMRs collected. This is even so, considering the gaps in knowledge and skills between clinicians at different qualification levels. By detailed analysis of one such collection we found that the two most prominent problems are the inconsistency in diagnosis results and the mismatch between the diagnosis results and the chief complaints and the current illness history. In order to ensure the quality and effectiveness in building a knowledge graph from these real-world data, this paper proposed a “dirty” data cleaning framework including diagnostic results normalization and semantic similarity matching. The symptom-disease knowledge graph constructed from the cleaned data has been applied and verified in the intelligent consultation system.
从真实世界的多源“肮脏”临床电子病历中构建知识图谱,用于智能咨询应用
智能临床会诊是根据已建立的症状与疾病之间的关系,从患者主诉中推断出可能发生疾病的诊断支持系统。这里的关键是从现实世界的临床数据中学习和自动构建通用的“症状-疾病”医学知识图(MKG)。因此,临床数据(主要是电子病历)的质量直接影响MKG的质量,而MKG的质量又决定了会诊结果的质量。区域公共卫生信息平台汇集了区域内各级医院的大量电子病历首页。医院使用的医疗IT系统通常来自不同的供应商,每个供应商可能有自己的数据标准和数据质量控制标准,这一事实必然会导致所收集的电子病历的质量存在明显差异。考虑到不同资格级别的临床医生在知识和技能方面的差距,情况也是如此。通过对其中一份病例的详细分析,我们发现两个最突出的问题是诊断结果不一致,诊断结果与主诉和当前病史不匹配。为了保证从这些真实数据中构建知识图谱的质量和有效性,本文提出了一种包含诊断结果归一化和语义相似度匹配的“脏”数据清理框架。将清洗后的数据构建的症状-疾病知识图谱应用于智能会诊系统并进行了验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信