基于非结构化电子病历的肺癌总生存期预测的层次嵌入关注。

IF 3.3 3区 医学 Q2 MEDICAL INFORMATICS
Domenico Paolo, Carlo Greco, Alessio Cortellini, Sara Ramella, Paolo Soda, Alessandro Bria, Rosa Sicilia
{"title":"基于非结构化电子病历的肺癌总生存期预测的层次嵌入关注。","authors":"Domenico Paolo, Carlo Greco, Alessio Cortellini, Sara Ramella, Paolo Soda, Alessandro Bria, Rosa Sicilia","doi":"10.1186/s12911-025-02998-6","DOIUrl":null,"url":null,"abstract":"<p><p>The automated processing of Electronic Health Records (EHRs) poses a significant challenge due to their unstructured nature, rich in valuable, yet disorganized information. Natural Language Processing (NLP), particularly Named Entity Recognition (NER), has been instrumental in extracting structured information from EHR data. However, existing literature primarly focuses on extracting handcrafted clinical features through NLP and NER methods without delving into their learned representations. In this work, we explore the untapped potential of these representations by considering their contextual richness and entity-specific information. Our proposed methodology extracts representations generated by a transformer-based NER model on EHRs data, combines them using a hierarchical attention mechanism, and employs the obtained enriched representation as input for a clinical prediction model. Specifically, this study addresses Overall Survival (OS) in Non-Small Cell Lung Cancer (NSCLC) using unstructured EHRs data collected from an Italian clinical centre encompassing 838 records from 231 lung cancer patients. Whilst our study is applied on EHRs written in Italian, it serves as use case to prove the effectiveness of extracting and employing high level textual representations that capture relevant information as named entities. Our methodology is interpretable because the hierarchical attention mechanism highlights the information in EHRs that the model considers the most crucial during the decision-making process. We validated this interpretability by measuring the agreement of domain experts on the importance assigned by the hierarchical attention mechanism to EHRs information through a questionnaire. Results demonstrate the effectiveness of our method, showcasing statistically significant improvements over traditional manually extracted clinical features.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"169"},"PeriodicalIF":3.3000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12007135/pdf/","citationCount":"0","resultStr":"{\"title\":\"Hierarchical embedding attention for overall survival prediction in lung cancer from unstructured EHRs.\",\"authors\":\"Domenico Paolo, Carlo Greco, Alessio Cortellini, Sara Ramella, Paolo Soda, Alessandro Bria, Rosa Sicilia\",\"doi\":\"10.1186/s12911-025-02998-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The automated processing of Electronic Health Records (EHRs) poses a significant challenge due to their unstructured nature, rich in valuable, yet disorganized information. Natural Language Processing (NLP), particularly Named Entity Recognition (NER), has been instrumental in extracting structured information from EHR data. However, existing literature primarly focuses on extracting handcrafted clinical features through NLP and NER methods without delving into their learned representations. In this work, we explore the untapped potential of these representations by considering their contextual richness and entity-specific information. Our proposed methodology extracts representations generated by a transformer-based NER model on EHRs data, combines them using a hierarchical attention mechanism, and employs the obtained enriched representation as input for a clinical prediction model. Specifically, this study addresses Overall Survival (OS) in Non-Small Cell Lung Cancer (NSCLC) using unstructured EHRs data collected from an Italian clinical centre encompassing 838 records from 231 lung cancer patients. Whilst our study is applied on EHRs written in Italian, it serves as use case to prove the effectiveness of extracting and employing high level textual representations that capture relevant information as named entities. Our methodology is interpretable because the hierarchical attention mechanism highlights the information in EHRs that the model considers the most crucial during the decision-making process. We validated this interpretability by measuring the agreement of domain experts on the importance assigned by the hierarchical attention mechanism to EHRs information through a questionnaire. Results demonstrate the effectiveness of our method, showcasing statistically significant improvements over traditional manually extracted clinical features.</p>\",\"PeriodicalId\":9340,\"journal\":{\"name\":\"BMC Medical Informatics and Decision Making\",\"volume\":\"25 1\",\"pages\":\"169\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12007135/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medical Informatics and Decision Making\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12911-025-02998-6\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICAL INFORMATICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-02998-6","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

摘要

电子健康记录(EHRs)的自动化处理由于其非结构化的性质、丰富的有价值但无组织的信息而提出了重大挑战。自然语言处理(NLP),特别是命名实体识别(NER),在从电子病历数据中提取结构化信息方面发挥了重要作用。然而,现有文献主要侧重于通过NLP和NER方法提取手工制作的临床特征,而没有深入研究它们的学习表征。在这项工作中,我们通过考虑它们的上下文丰富性和实体特定信息来探索这些表征尚未开发的潜力。我们提出的方法提取由基于变压器的NER模型在EHRs数据上生成的表征,使用分层注意机制将它们组合起来,并将获得的丰富表征作为临床预测模型的输入。具体而言,本研究使用从意大利临床中心收集的非结构化电子病历数据,包括来自231名肺癌患者的838条记录,研究了非小细胞肺癌(NSCLC)的总生存率(OS)。虽然我们的研究应用于用意大利语编写的电子病历,但它可以作为用例来证明提取和使用高级文本表示的有效性,这些文本表示将相关信息捕获为命名实体。我们的方法是可解释的,因为分层注意机制突出了模型在决策过程中认为最重要的电子病历信息。我们通过问卷调查来衡量领域专家对分层关注机制对电子病历信息分配的重要性的共识,从而验证了这种可解释性。结果证明了我们的方法的有效性,与传统的手动提取临床特征相比,在统计上有显著的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hierarchical embedding attention for overall survival prediction in lung cancer from unstructured EHRs.

The automated processing of Electronic Health Records (EHRs) poses a significant challenge due to their unstructured nature, rich in valuable, yet disorganized information. Natural Language Processing (NLP), particularly Named Entity Recognition (NER), has been instrumental in extracting structured information from EHR data. However, existing literature primarly focuses on extracting handcrafted clinical features through NLP and NER methods without delving into their learned representations. In this work, we explore the untapped potential of these representations by considering their contextual richness and entity-specific information. Our proposed methodology extracts representations generated by a transformer-based NER model on EHRs data, combines them using a hierarchical attention mechanism, and employs the obtained enriched representation as input for a clinical prediction model. Specifically, this study addresses Overall Survival (OS) in Non-Small Cell Lung Cancer (NSCLC) using unstructured EHRs data collected from an Italian clinical centre encompassing 838 records from 231 lung cancer patients. Whilst our study is applied on EHRs written in Italian, it serves as use case to prove the effectiveness of extracting and employing high level textual representations that capture relevant information as named entities. Our methodology is interpretable because the hierarchical attention mechanism highlights the information in EHRs that the model considers the most crucial during the decision-making process. We validated this interpretability by measuring the agreement of domain experts on the importance assigned by the hierarchical attention mechanism to EHRs information through a questionnaire. Results demonstrate the effectiveness of our method, showcasing statistically significant improvements over traditional manually extracted clinical features.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
7.20
自引率
5.70%
发文量
297
审稿时长
1 months
期刊介绍: BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信