A modular pipeline for natural language processing-screened human abstraction of a pragmatic trial outcome from electronic health records.

medRxiv : the preprint server for health sciences Pub Date : 2025-09-26 DOI:10.1101/2025.06.23.25330134

Robert Y Lee, Kevin S Li, James Sibley, Trevor Cohen, William B Lober, Janaki O'Brien, Nicole LeDuc, Kasey Mallon Andrews, Anna Ungar, Jessica Walsh, Elizabeth L Nielsen, Danae G Dotolo, Erin K Kross

{"title":"A modular pipeline for natural language processing-screened human abstraction of a pragmatic trial outcome from electronic health records.","authors":"Robert Y Lee, Kevin S Li, James Sibley, Trevor Cohen, William B Lober, Janaki O'Brien, Nicole LeDuc, Kasey Mallon Andrews, Anna Ungar, Jessica Walsh, Elizabeth L Nielsen, Danae G Dotolo, Erin K Kross","doi":"10.1101/2025.06.23.25330134","DOIUrl":null,"url":null,"abstract":"Background: Natural language processing (NLP) allows efficient extraction of clinical variables and outcomes from electronic health records (EHR). However, measuring pragmatic clinical trial outcomes may demand accuracy that exceeds NLP performance. Combining NLP with human adjudication can address this gap, yet few software solutions support such workflows. We developed a modular, scalable system for NLP-screened human abstraction to measure the primary outcomes of two clinical trials.Methods: In two clinical trials of hospitalized patients with serious illness, a deep-learning NLP model screened EHR passages for documented goals-of-care discussions. Screen-positive passages were referred for human adjudication using a REDCap-based system to measure the trial outcomes. Dynamic pooling of passages using structured query language (SQL) within the REDCap database reduced unnecessary abstraction while ensuring data completeness.Results: In the first trial (N=2,512), NLP identified 22,187 screen-positive passages (0.8%) from 2.6 million EHR passages. Human reviewers adjudicated 7,494 passages over 34.3 abstractor-hours to measure the cumulative incidence and time to first documented goals-of-care discussion for all patients with 92.6% patient-level sensitivity. In the second trial (N=617), NLP identified 8,952 screen-positive passages (1.6%) from 559,596 passages at a threshold with near-100% sensitivity. Human reviewers adjudicated 3,509 passages over 27.9 abstractor-hours to measure the same outcome for all patients.Conclusion: We present the design and source code for a scalable and efficient pipeline for measuring complex EHR-derived outcomes using NLP-screened human abstraction. This implementation is adaptable to diverse research needs, and its modular pipeline represents a practical middle ground between custom software and commercial platforms.","PeriodicalId":94281,"journal":{"name":"medRxiv : the preprint server for health sciences","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12262768/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"medRxiv : the preprint server for health sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2025.06.23.25330134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Natural language processing (NLP) allows efficient extraction of clinical variables and outcomes from electronic health records (EHR). However, measuring pragmatic clinical trial outcomes may demand accuracy that exceeds NLP performance. Combining NLP with human adjudication can address this gap, yet few software solutions support such workflows. We developed a modular, scalable system for NLP-screened human abstraction to measure the primary outcomes of two clinical trials.

Methods: In two clinical trials of hospitalized patients with serious illness, a deep-learning NLP model screened EHR passages for documented goals-of-care discussions. Screen-positive passages were referred for human adjudication using a REDCap-based system to measure the trial outcomes. Dynamic pooling of passages using structured query language (SQL) within the REDCap database reduced unnecessary abstraction while ensuring data completeness.

Results: In the first trial (N=2,512), NLP identified 22,187 screen-positive passages (0.8%) from 2.6 million EHR passages. Human reviewers adjudicated 7,494 passages over 34.3 abstractor-hours to measure the cumulative incidence and time to first documented goals-of-care discussion for all patients with 92.6% patient-level sensitivity. In the second trial (N=617), NLP identified 8,952 screen-positive passages (1.6%) from 559,596 passages at a threshold with near-100% sensitivity. Human reviewers adjudicated 3,509 passages over 27.9 abstractor-hours to measure the same outcome for all patients.

Conclusion: We present the design and source code for a scalable and efficient pipeline for measuring complex EHR-derived outcomes using NLP-screened human abstraction. This implementation is adaptable to diverse research needs, and its modular pipeline represents a practical middle ground between custom software and commercial platforms.

Abstract Image

查看原文本刊更多论文

一个用于自然语言处理的模块化管道筛选了电子健康记录中实用试验结果的人类抽象。

背景：自然语言处理（NLP）可以有效地从电子健康记录（EHR）中提取临床变量和结果。然而，衡量实用的临床试验结果可能需要超过NLP性能的准确性。将NLP与人类裁决相结合可以解决这一差距，但很少有软件解决方案支持这样的工作流程。我们开发了一个模块化的，可扩展的系统，用于nlp筛选的人类抽象来测量两个临床试验的主要结果。方法：在两项重症住院患者的临床试验中，深度学习NLP模型筛选了EHR段落中记录的护理目标讨论。使用基于redcap的系统来测量试验结果，将筛选阳性通道提交给人工裁决。在REDCap数据库中使用结构化查询语言（SQL）的动态通道池减少了不必要的抽象，同时确保了数据的完整性。结果：在第一项试验中（N= 2512）， NLP从260万份EHR传代中鉴定出22187份筛选阳性传代（0.8%）。人类审稿人在34.3个抽象小时内对7494个传代进行了评审，以测量所有患者的累积发生率和首次记录护理目标讨论的时间，患者水平敏感性为92.6%。在第二次试验（N=617）中，NLP以接近100%的灵敏度阈值从559,596个传代中识别出8,952个筛选阳性传代（1.6%）。人类审稿人在27.9个抽象小时内评审了3509篇文章，以衡量所有患者的相同结果。结论：我们提出了一个可扩展和有效的管道的设计和源代码，用于使用nlp筛选的人类抽象来测量复杂的ehr衍生结果。这种实现可以适应不同的研究需求，其模块化管道代表了自定义软件和商业平台之间的实际中间地带。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

medRxiv : the preprint server for health sciences

自引率

0.00%

发文量