Examining the Generalizability of Pretrained De-identification Transformer Models on Narrative Nursing Notes.

IF 2.1 2区 医学 Q4 MEDICAL INFORMATICS
Applied Clinical Informatics Pub Date : 2024-03-01 Epub Date: 2024-03-06 DOI:10.1055/a-2282-4340
Fangyi Chen, Syed Mohtashim Abbas Bokhari, Kenrick Cato, Gamze Gürsoy, Sarah Rossetti
{"title":"Examining the Generalizability of Pretrained De-identification Transformer Models on Narrative Nursing Notes.","authors":"Fangyi Chen, Syed Mohtashim Abbas Bokhari, Kenrick Cato, Gamze Gürsoy, Sarah Rossetti","doi":"10.1055/a-2282-4340","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong> Narrative nursing notes are a valuable resource in informatics research with unique predictive signals about patient care. The open sharing of these data, however, is appropriately constrained by rigorous regulations set by the Health Insurance Portability and Accountability Act (HIPAA) for the protection of privacy. Several models have been developed and evaluated on the open-source i2b2 dataset. A focus on the generalizability of these models with respect to nursing notes remains understudied.</p><p><strong>Objectives: </strong> The study aims to understand the generalizability of pretrained transformer models and investigate the variability of personal protected health information (PHI) distribution patterns between discharge summaries and nursing notes with a goal to inform the future design for model evaluation schema.</p><p><strong>Methods: </strong> Two pretrained transformer models (RoBERTa, ClinicalBERT) fine-tuned on i2b2 2014 discharge summaries were evaluated on our data inpatient nursing notes and compared with the baseline performance. Statistical testing was deployed to assess differences in PHI distribution across discharge summaries and nursing notes.</p><p><strong>Results: </strong> RoBERTa achieved the optimal performance when tested on an external source of data, with an F1 score of 0.887 across PHI categories and 0.932 in the PHI binary task. Overall, discharge summaries contained a higher number of PHI instances and categories of PHI compared with inpatient nursing notes.</p><p><strong>Conclusion: </strong> The study investigated the applicability of two pretrained transformers on inpatient nursing notes and examined the distinctions between nursing notes and discharge summaries concerning the utilization of personal PHI. Discharge summaries presented a greater quantity of PHI instances and types when compared with narrative nursing notes, but narrative nursing notes exhibited more diversity in the types of PHI present, with some pertaining to patient's personal life. The insights obtained from the research help improve the design and selection of algorithms, as well as contribute to the development of suitable performance thresholds for PHI.</p>","PeriodicalId":48956,"journal":{"name":"Applied Clinical Informatics","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11078567/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Clinical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1055/a-2282-4340","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/6 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0

Abstract

Background:  Narrative nursing notes are a valuable resource in informatics research with unique predictive signals about patient care. The open sharing of these data, however, is appropriately constrained by rigorous regulations set by the Health Insurance Portability and Accountability Act (HIPAA) for the protection of privacy. Several models have been developed and evaluated on the open-source i2b2 dataset. A focus on the generalizability of these models with respect to nursing notes remains understudied.

Objectives:  The study aims to understand the generalizability of pretrained transformer models and investigate the variability of personal protected health information (PHI) distribution patterns between discharge summaries and nursing notes with a goal to inform the future design for model evaluation schema.

Methods:  Two pretrained transformer models (RoBERTa, ClinicalBERT) fine-tuned on i2b2 2014 discharge summaries were evaluated on our data inpatient nursing notes and compared with the baseline performance. Statistical testing was deployed to assess differences in PHI distribution across discharge summaries and nursing notes.

Results:  RoBERTa achieved the optimal performance when tested on an external source of data, with an F1 score of 0.887 across PHI categories and 0.932 in the PHI binary task. Overall, discharge summaries contained a higher number of PHI instances and categories of PHI compared with inpatient nursing notes.

Conclusion:  The study investigated the applicability of two pretrained transformers on inpatient nursing notes and examined the distinctions between nursing notes and discharge summaries concerning the utilization of personal PHI. Discharge summaries presented a greater quantity of PHI instances and types when compared with narrative nursing notes, but narrative nursing notes exhibited more diversity in the types of PHI present, with some pertaining to patient's personal life. The insights obtained from the research help improve the design and selection of algorithms, as well as contribute to the development of suitable performance thresholds for PHI.

在叙事性护理笔记上检验预先训练的去识别转换器模型的通用性。
护理叙事笔记是信息学研究的宝贵资源,具有独特的病人护理预测信号。然而,这些数据的开放共享受到《健康保险可携性和责任法案》(HIPAA)中有关隐私保护的严格规定的限制。在开源的 i2b2 数据集上开发并评估了多个模型。关于这些模型在护理记录方面的通用性的研究仍然不足。本研究旨在了解预先训练的转换器模型的通用性,并调查出院摘要和护理记录之间个人受保护健康信息 (PHI) 分布模式的可变性,目的是为未来的模型评估模式设计提供参考。在 i2b2 2014 出院摘要上微调的两个预训练转换器模型(RoBERTa 和 ClinicalBERT)在我们的住院护理记录数据上进行了评估,并与基线性能进行了比较。通过统计测试评估了出院摘要和护理记录中 PHI 分布的差异。在外部数据源上进行测试时,RoBERTa 达到了最佳性能,PHI 类别的 F1 得分为 0.887,PHI 二进制任务的 F1 得分为 0.932。总体而言,与住院护理记录相比,出院摘要中包含的 PHI 实例和 PHI 类别更多。该研究调查了两个预先训练好的转换器在住院护理记录中的适用性,并检查了护理记录和出院摘要在个人受保护健康信息使用方面的区别。与叙述式护理记录相比,出院摘要中的个人健康信息实例和类型更多,但叙述式护理记录中的个人健康信息类型更加多样化,其中一些涉及病人的个人生活。从研究中获得的启示有助于改进算法的设计和选择,并有助于为 PHI 制定合适的性能阈值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Clinical Informatics
Applied Clinical Informatics MEDICAL INFORMATICS-
CiteScore
4.60
自引率
24.10%
发文量
132
期刊介绍: ACI is the third Schattauer journal dealing with biomedical and health informatics. It perfectly complements our other journals Öffnet internen Link im aktuellen FensterMethods of Information in Medicine and the Öffnet internen Link im aktuellen FensterYearbook of Medical Informatics. The Yearbook of Medical Informatics being the “Milestone” or state-of-the-art journal and Methods of Information in Medicine being the “Science and Research” journal of IMIA, ACI intends to be the “Practical” journal of IMIA.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信