非结构化临床记录的神经语言建模用于自动患者表型

2022 56th Annual Conference on Information Sciences and Systems (CISS) Pub Date : 2022-03-09 DOI:10.1109/CISS53076.2022.9751198

Akshara Prabhakar, S. Shidharth, Sowmya S Kamath

{"title":"非结构化临床记录的神经语言建模用于自动患者表型","authors":"Akshara Prabhakar, S. Shidharth, Sowmya S Kamath","doi":"10.1109/CISS53076.2022.9751198","DOIUrl":null,"url":null,"abstract":"The availability of huge volume and variety of healthcare data provides a wide scope for designing cutting-edge clinical decision support systems (CDSS) that can improve the quality of patient care. Identifying patients suffering from certain conditions/symptoms, commonly referred to as phenotyping, is a fundamental problem that can be addressed using the rich health-related data collected for generation of Electronic Health Records (EHRs). Phenotyping forms the foundation for translational research, effectiveness studies, and is used for analyzing population health using regularly collected EHR data. Also, determining if a patient has a particular medical condition is crucial for secondary analysis, such as in critical care situations to predict potential drug interactions and adverse events. In this paper, we consider all categories of unstructured clinical notes of patients, typically stored as part of EHRs in the raw form. The standard MIMIC-III dataset is considered for benchmark experiments for patient phenotyping. Experiments revealed that our proposed models outperformed state-of-the art works built on vanilla BERT & ClinicalBERT models on the patient cohort considered, measured in terms of standard multi-label classification metrics like AUROC score (improvement by 6%), F1-score (by 4%), and Hamming Loss (by 17%) when we considered only patient discharge summaries and radiology notes. Further experiments with other note categories showed that using discharge summaries and physician notes yields significant improvements on the entire dataset giving 0.8 AUROC score, 0.72 F1 score, 0.09 Hamming loss.","PeriodicalId":305918,"journal":{"name":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Neural Language Modeling of Unstructured Clinical Notes for Automated Patient Phenotyping\",\"authors\":\"Akshara Prabhakar, S. Shidharth, Sowmya S Kamath\",\"doi\":\"10.1109/CISS53076.2022.9751198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The availability of huge volume and variety of healthcare data provides a wide scope for designing cutting-edge clinical decision support systems (CDSS) that can improve the quality of patient care. Identifying patients suffering from certain conditions/symptoms, commonly referred to as phenotyping, is a fundamental problem that can be addressed using the rich health-related data collected for generation of Electronic Health Records (EHRs). Phenotyping forms the foundation for translational research, effectiveness studies, and is used for analyzing population health using regularly collected EHR data. Also, determining if a patient has a particular medical condition is crucial for secondary analysis, such as in critical care situations to predict potential drug interactions and adverse events. In this paper, we consider all categories of unstructured clinical notes of patients, typically stored as part of EHRs in the raw form. The standard MIMIC-III dataset is considered for benchmark experiments for patient phenotyping. Experiments revealed that our proposed models outperformed state-of-the art works built on vanilla BERT & ClinicalBERT models on the patient cohort considered, measured in terms of standard multi-label classification metrics like AUROC score (improvement by 6%), F1-score (by 4%), and Hamming Loss (by 17%) when we considered only patient discharge summaries and radiology notes. Further experiments with other note categories showed that using discharge summaries and physician notes yields significant improvements on the entire dataset giving 0.8 AUROC score, 0.72 F1 score, 0.09 Hamming loss.\",\"PeriodicalId\":305918,\"journal\":{\"name\":\"2022 56th Annual Conference on Information Sciences and Systems (CISS)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 56th Annual Conference on Information Sciences and Systems (CISS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISS53076.2022.9751198\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISS53076.2022.9751198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

大量和各种医疗保健数据的可用性为设计可以提高患者护理质量的尖端临床决策支持系统(CDSS)提供了广泛的范围。识别患有某些病症/症状(通常称为表型)的患者是一个基本问题，可以使用为生成电子健康记录(EHRs)而收集的丰富健康相关数据来解决这个问题。表型形成了转化研究、有效性研究的基础，并用于使用定期收集的电子病历数据分析人群健康。此外，确定患者是否有特定的医疗状况对于二次分析至关重要，例如在重症监护情况下预测潜在的药物相互作用和不良事件。在本文中，我们考虑了患者的所有类别的非结构化临床笔记，通常以原始形式存储为电子病历的一部分。标准的MIMIC-III数据集被认为是患者表型的基准实验。实验表明，我们提出的模型在考虑的患者队列中优于基于vanilla BERT和ClinicalBERT模型的最先进的作品，当我们只考虑患者出院总结和放射记录时，以标准的多标签分类指标如AUROC评分(提高6%)，f1评分(提高4%)和Hamming Loss(提高17%)来衡量。对其他笔记类别的进一步实验表明，使用出院摘要和医生笔记对整个数据集产生了显著的改进，AUROC得分为0.8,F1得分为0.72,Hamming损失为0.09。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Neural Language Modeling of Unstructured Clinical Notes for Automated Patient Phenotyping

The availability of huge volume and variety of healthcare data provides a wide scope for designing cutting-edge clinical decision support systems (CDSS) that can improve the quality of patient care. Identifying patients suffering from certain conditions/symptoms, commonly referred to as phenotyping, is a fundamental problem that can be addressed using the rich health-related data collected for generation of Electronic Health Records (EHRs). Phenotyping forms the foundation for translational research, effectiveness studies, and is used for analyzing population health using regularly collected EHR data. Also, determining if a patient has a particular medical condition is crucial for secondary analysis, such as in critical care situations to predict potential drug interactions and adverse events. In this paper, we consider all categories of unstructured clinical notes of patients, typically stored as part of EHRs in the raw form. The standard MIMIC-III dataset is considered for benchmark experiments for patient phenotyping. Experiments revealed that our proposed models outperformed state-of-the art works built on vanilla BERT & ClinicalBERT models on the patient cohort considered, measured in terms of standard multi-label classification metrics like AUROC score (improvement by 6%), F1-score (by 4%), and Hamming Loss (by 17%) when we considered only patient discharge summaries and radiology notes. Further experiments with other note categories showed that using discharge summaries and physician notes yields significant improvements on the entire dataset giving 0.8 AUROC score, 0.72 F1 score, 0.09 Hamming loss.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 56th Annual Conference on Information Sciences and Systems (CISS)

自引率

0.00%

发文量