Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study.

IF 1.9 Q3 MEDICINE, RESEARCH & EXPERIMENTAL

Interactive Journal of Medical Research Pub Date : 2023-08-25 DOI:10.2196/46322

Leibo Liu, Oscar Perez-Concha, Anthony Nguyen, Vicki Bennett, Victoria Blake, Blanca Gallego, Louisa Jorm

{"title":"Web-Based Application Based on Human-in-the-Loop Deep Learning for Deidentifying Free-Text Data in Electronic Medical Records: Development and Usability Study.","authors":"Leibo Liu, Oscar Perez-Concha, Anthony Nguyen, Vicki Bennett, Victoria Blake, Blanca Gallego, Louisa Jorm","doi":"10.2196/46322","DOIUrl":null,"url":null,"abstract":"Background: The narrative free-text data in electronic medical records (EMRs) contain valuable clinical information for analysis and research to inform better patient care. However, the release of free text for secondary use is hindered by concerns surrounding personally identifiable information (PII), as protecting individuals' privacy is paramount. Therefore, it is necessary to deidentify free text to remove PII. Manual deidentification is a time-consuming and labor-intensive process. Numerous automated deidentification approaches and systems have been attempted to overcome this challenge over the past decade.Objective: We sought to develop an accurate, web-based system deidentifying free text (DEFT), which can be readily and easily adopted in real-world settings for deidentification of free text in EMRs. The system has several key features including a simple and task-focused web user interface, customized PII types, use of a state-of-the-art deep learning model for tagging PII from free text, preannotation by an interactive learning loop, rapid manual annotation with autosave, support for project management and team collaboration, user access control, and central data storage.Methods: DEFT comprises frontend and backend modules and communicates with central data storage through a filesystem path access. The frontend web user interface provides end users with a user-friendly workspace for managing and annotating free text. The backend module processes the requests from the frontend and performs relevant persistence operations. DEFT manages the deidentification workflow as a project, which can contain one or more data sets. Customized PII types and user access control can also be configured. The deep learning model is based on a Bidirectional Long Short-Term Memory-Conditional Random Field (BiLSTM-CRF) with RoBERTa as the word embedding layer. The interactive learning loop is further integrated into DEFT to speed up the deidentification process and increase its performance over time.Results: DEFT has many advantages over existing deidentification systems in terms of its support for project management, user access control, data management, and an interactive learning process. Experimental results from DEFT on the 2014 i2b2 data set obtained the highest performance compared to 5 benchmark models in terms of microaverage strict entity-level recall and F1-scores of 0.9563 and 0.9627, respectively. In a real-world use case of deidentifying clinical notes, extracted from 1 referral hospital in Sydney, New South Wales, Australia, DEFT achieved a high microaverage strict entity-level F1-score of 0.9507 on a corpus of 600 annotated clinical notes. Moreover, the manual annotation process with preannotation demonstrated a 43% increase in work efficiency compared to the process without preannotation.Conclusions: DEFT is designed for health domain researchers and data custodians to easily deidentify free text in EMRs. DEFT supports an interactive learning loop and end users with minimal technical knowledge can perform the deidentification work with only a shallow learning curve.","PeriodicalId":51757,"journal":{"name":"Interactive Journal of Medical Research","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10492176/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interactive Journal of Medical Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/46322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Background: The narrative free-text data in electronic medical records (EMRs) contain valuable clinical information for analysis and research to inform better patient care. However, the release of free text for secondary use is hindered by concerns surrounding personally identifiable information (PII), as protecting individuals' privacy is paramount. Therefore, it is necessary to deidentify free text to remove PII. Manual deidentification is a time-consuming and labor-intensive process. Numerous automated deidentification approaches and systems have been attempted to overcome this challenge over the past decade.

Objective: We sought to develop an accurate, web-based system deidentifying free text (DEFT), which can be readily and easily adopted in real-world settings for deidentification of free text in EMRs. The system has several key features including a simple and task-focused web user interface, customized PII types, use of a state-of-the-art deep learning model for tagging PII from free text, preannotation by an interactive learning loop, rapid manual annotation with autosave, support for project management and team collaboration, user access control, and central data storage.

Methods: DEFT comprises frontend and backend modules and communicates with central data storage through a filesystem path access. The frontend web user interface provides end users with a user-friendly workspace for managing and annotating free text. The backend module processes the requests from the frontend and performs relevant persistence operations. DEFT manages the deidentification workflow as a project, which can contain one or more data sets. Customized PII types and user access control can also be configured. The deep learning model is based on a Bidirectional Long Short-Term Memory-Conditional Random Field (BiLSTM-CRF) with RoBERTa as the word embedding layer. The interactive learning loop is further integrated into DEFT to speed up the deidentification process and increase its performance over time.

Results: DEFT has many advantages over existing deidentification systems in terms of its support for project management, user access control, data management, and an interactive learning process. Experimental results from DEFT on the 2014 i2b2 data set obtained the highest performance compared to 5 benchmark models in terms of microaverage strict entity-level recall and F₁-scores of 0.9563 and 0.9627, respectively. In a real-world use case of deidentifying clinical notes, extracted from 1 referral hospital in Sydney, New South Wales, Australia, DEFT achieved a high microaverage strict entity-level F₁-score of 0.9507 on a corpus of 600 annotated clinical notes. Moreover, the manual annotation process with preannotation demonstrated a 43% increase in work efficiency compared to the process without preannotation.

Conclusions: DEFT is designed for health domain researchers and data custodians to easily deidentify free text in EMRs. DEFT supports an interactive learning loop and end users with minimal technical knowledge can perform the deidentification work with only a shallow learning curve.

Abstract Image

查看原文本刊更多论文

基于人在循环深度学习的基于web的电子病历自由文本数据去识别应用:开发和可用性研究。

背景:电子病历(EMRs)中的叙述性自由文本数据包含有价值的临床信息，可用于分析和研究，以告知更好的患者护理。然而，由于个人身份信息(PII)的保护至关重要，因此对免费文本的二次使用的发布受到了有关个人身份信息(PII)的担忧的阻碍。因此，有必要去识别自由文本以删除PII。人工去识别是一个费时费力的过程。在过去的十年中，已经尝试了许多自动去识别方法和系统来克服这一挑战。目的:我们试图开发一种准确的、基于网络的自由文本去识别系统(DEFT)，该系统可以很容易地在现实环境中用于电子病历中自由文本的去识别。该系统具有几个关键功能，包括简单且以任务为中心的web用户界面、自定义PII类型、使用最先进的深度学习模型从自由文本标记PII、通过交互式学习循环进行预注释、具有自动保存功能的快速手动注释、支持项目管理和团队协作、用户访问控制和中央数据存储。方法:DEFT包括前端和后端模块，并通过文件系统路径访问与中央数据存储通信。前端web用户界面为最终用户提供了一个用户友好的工作空间，用于管理和注释自由文本。后端模块处理来自前端的请求并执行相关的持久化操作。DEFT将去识别工作流作为一个项目来管理，该项目可以包含一个或多个数据集。还可以配置自定义PII类型和用户访问控制。深度学习模型基于双向长短期记忆-条件随机场(BiLSTM-CRF)， RoBERTa作为词嵌入层。交互式学习循环进一步集成到DEFT中，以加快去识别过程，并随着时间的推移提高其性能。结果:与现有的去识别系统相比，DEFT在支持项目管理、用户访问控制、数据管理和交互式学习过程方面具有许多优势。在2014年i2b2数据集上，DEFT的实验结果在微平均严格实体级召回率和f1得分方面，在5个基准模型中表现最好，分别为0.9563和0.9627。在从澳大利亚新南威尔士州悉尼的一家转诊医院提取的临床笔记去识别的实际用例中，DEFT在600个注释临床笔记的语料库中获得了0.9507的高微平均严格实体级f1分数。此外，有预标注的手工标注过程比没有预标注的手工标注过程工作效率提高了43%。结论:DEFT是为健康领域的研究人员和数据管理员设计的，可以轻松地识别电子病历中的自由文本。DEFT支持一个交互式的学习循环，终端用户用最少的技术知识就可以用一个很浅的学习曲线来执行去识别工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Interactive Journal of Medical Research MEDICINE, RESEARCH & EXPERIMENTAL-

自引率

0.00%

发文量

审稿时长

12 weeks