非结构化临床记录的神经语言建模用于自动患者表型

Akshara Prabhakar, S. Shidharth, Sowmya S Kamath
{"title":"非结构化临床记录的神经语言建模用于自动患者表型","authors":"Akshara Prabhakar, S. Shidharth, Sowmya S Kamath","doi":"10.1109/CISS53076.2022.9751198","DOIUrl":null,"url":null,"abstract":"The availability of huge volume and variety of healthcare data provides a wide scope for designing cutting-edge clinical decision support systems (CDSS) that can improve the quality of patient care. Identifying patients suffering from certain conditions/symptoms, commonly referred to as phenotyping, is a fundamental problem that can be addressed using the rich health-related data collected for generation of Electronic Health Records (EHRs). Phenotyping forms the foundation for translational research, effectiveness studies, and is used for analyzing population health using regularly collected EHR data. Also, determining if a patient has a particular medical condition is crucial for secondary analysis, such as in critical care situations to predict potential drug interactions and adverse events. In this paper, we consider all categories of unstructured clinical notes of patients, typically stored as part of EHRs in the raw form. The standard MIMIC-III dataset is considered for benchmark experiments for patient phenotyping. Experiments revealed that our proposed models outperformed state-of-the art works built on vanilla BERT & ClinicalBERT models on the patient cohort considered, measured in terms of standard multi-label classification metrics like AUROC score (improvement by 6%), F1-score (by 4%), and Hamming Loss (by 17%) when we considered only patient discharge summaries and radiology notes. Further experiments with other note categories showed that using discharge summaries and physician notes yields significant improvements on the entire dataset giving 0.8 AUROC score, 0.72 F1 score, 0.09 Hamming loss.","PeriodicalId":305918,"journal":{"name":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Neural Language Modeling of Unstructured Clinical Notes for Automated Patient Phenotyping\",\"authors\":\"Akshara Prabhakar, S. Shidharth, Sowmya S Kamath\",\"doi\":\"10.1109/CISS53076.2022.9751198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The availability of huge volume and variety of healthcare data provides a wide scope for designing cutting-edge clinical decision support systems (CDSS) that can improve the quality of patient care. Identifying patients suffering from certain conditions/symptoms, commonly referred to as phenotyping, is a fundamental problem that can be addressed using the rich health-related data collected for generation of Electronic Health Records (EHRs). Phenotyping forms the foundation for translational research, effectiveness studies, and is used for analyzing population health using regularly collected EHR data. Also, determining if a patient has a particular medical condition is crucial for secondary analysis, such as in critical care situations to predict potential drug interactions and adverse events. In this paper, we consider all categories of unstructured clinical notes of patients, typically stored as part of EHRs in the raw form. The standard MIMIC-III dataset is considered for benchmark experiments for patient phenotyping. Experiments revealed that our proposed models outperformed state-of-the art works built on vanilla BERT & ClinicalBERT models on the patient cohort considered, measured in terms of standard multi-label classification metrics like AUROC score (improvement by 6%), F1-score (by 4%), and Hamming Loss (by 17%) when we considered only patient discharge summaries and radiology notes. Further experiments with other note categories showed that using discharge summaries and physician notes yields significant improvements on the entire dataset giving 0.8 AUROC score, 0.72 F1 score, 0.09 Hamming loss.\",\"PeriodicalId\":305918,\"journal\":{\"name\":\"2022 56th Annual Conference on Information Sciences and Systems (CISS)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 56th Annual Conference on Information Sciences and Systems (CISS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISS53076.2022.9751198\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS53076.2022.9751198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

大量和各种医疗保健数据的可用性为设计可以提高患者护理质量的尖端临床决策支持系统(CDSS)提供了广泛的范围。识别患有某些病症/症状(通常称为表型)的患者是一个基本问题,可以使用为生成电子健康记录(EHRs)而收集的丰富健康相关数据来解决这个问题。表型形成了转化研究、有效性研究的基础,并用于使用定期收集的电子病历数据分析人群健康。此外,确定患者是否有特定的医疗状况对于二次分析至关重要,例如在重症监护情况下预测潜在的药物相互作用和不良事件。在本文中,我们考虑了患者的所有类别的非结构化临床笔记,通常以原始形式存储为电子病历的一部分。标准的MIMIC-III数据集被认为是患者表型的基准实验。实验表明,我们提出的模型在考虑的患者队列中优于基于vanilla BERT和ClinicalBERT模型的最先进的作品,当我们只考虑患者出院总结和放射记录时,以标准的多标签分类指标如AUROC评分(提高6%),f1评分(提高4%)和Hamming Loss(提高17%)来衡量。对其他笔记类别的进一步实验表明,使用出院摘要和医生笔记对整个数据集产生了显著的改进,AUROC得分为0.8,F1得分为0.72,Hamming损失为0.09。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Neural Language Modeling of Unstructured Clinical Notes for Automated Patient Phenotyping
The availability of huge volume and variety of healthcare data provides a wide scope for designing cutting-edge clinical decision support systems (CDSS) that can improve the quality of patient care. Identifying patients suffering from certain conditions/symptoms, commonly referred to as phenotyping, is a fundamental problem that can be addressed using the rich health-related data collected for generation of Electronic Health Records (EHRs). Phenotyping forms the foundation for translational research, effectiveness studies, and is used for analyzing population health using regularly collected EHR data. Also, determining if a patient has a particular medical condition is crucial for secondary analysis, such as in critical care situations to predict potential drug interactions and adverse events. In this paper, we consider all categories of unstructured clinical notes of patients, typically stored as part of EHRs in the raw form. The standard MIMIC-III dataset is considered for benchmark experiments for patient phenotyping. Experiments revealed that our proposed models outperformed state-of-the art works built on vanilla BERT & ClinicalBERT models on the patient cohort considered, measured in terms of standard multi-label classification metrics like AUROC score (improvement by 6%), F1-score (by 4%), and Hamming Loss (by 17%) when we considered only patient discharge summaries and radiology notes. Further experiments with other note categories showed that using discharge summaries and physician notes yields significant improvements on the entire dataset giving 0.8 AUROC score, 0.72 F1 score, 0.09 Hamming loss.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信