利用电子健康记录中的非结构化数据，利用自然语言处理实现目标试验框架

Global Epidemiology Pub Date : 2025-05-08 DOI:10.1016/j.gloepi.2025.100204

Nicole Rafalko , Milena Gianfrancesco , Neal D. Goldstein

{"title":"利用电子健康记录中的非结构化数据，利用自然语言处理实现目标试验框架","authors":"Nicole Rafalko , Milena Gianfrancesco , Neal D. Goldstein","doi":"10.1016/j.gloepi.2025.100204","DOIUrl":null,"url":null,"abstract":"<div><div>The increasing availability and accessibility of electronic health record (EHR) data has made it a rich secondary source to conduct comparative effectiveness studies. To perform such studies, many researchers are turning to the target trial framework (TTF) to emulate the hypothetical randomized clinical trial. The quality of this emulation depends, in part, on the availability and accessibility of data for each component of the TTF. Yet one overarching challenge with using EHR data is that unstructured fields, such as clinical encounter notes, contain copious details on the patient yet require additional steps to extract if needed in the conduct of the study. Natural language processing (NLP) represents a spectrum of methods to assist with automating this extraction, from simpler rule-based methods to machine learning and artificial intelligence approaches that can handle complex language structures. What follows is a discussion on how NLP methods can augment information and data for researchers looking to estimate a treatment effect using EHR data via the TTF to emulate the hypothetical clinical trial. We conclude with recommendations for researchers interested in using NLP methods to obtain data stored in the free text of the EHR as well as considerations regarding the quality and validity of this data for the TTF.</div></div>","PeriodicalId":36311,"journal":{"name":"Global Epidemiology","volume":"9 ","pages":"Article 100204"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the use of natural language processing to implement the target trial framework using unstructured data from the electronic health record\",\"authors\":\"Nicole Rafalko , Milena Gianfrancesco , Neal D. Goldstein\",\"doi\":\"10.1016/j.gloepi.2025.100204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The increasing availability and accessibility of electronic health record (EHR) data has made it a rich secondary source to conduct comparative effectiveness studies. To perform such studies, many researchers are turning to the target trial framework (TTF) to emulate the hypothetical randomized clinical trial. The quality of this emulation depends, in part, on the availability and accessibility of data for each component of the TTF. Yet one overarching challenge with using EHR data is that unstructured fields, such as clinical encounter notes, contain copious details on the patient yet require additional steps to extract if needed in the conduct of the study. Natural language processing (NLP) represents a spectrum of methods to assist with automating this extraction, from simpler rule-based methods to machine learning and artificial intelligence approaches that can handle complex language structures. What follows is a discussion on how NLP methods can augment information and data for researchers looking to estimate a treatment effect using EHR data via the TTF to emulate the hypothetical clinical trial. We conclude with recommendations for researchers interested in using NLP methods to obtain data stored in the free text of the EHR as well as considerations regarding the quality and validity of this data for the TTF.</div></div>\",\"PeriodicalId\":36311,\"journal\":{\"name\":\"Global Epidemiology\",\"volume\":\"9 \",\"pages\":\"Article 100204\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Epidemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590113325000227\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590113325000227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

电子健康记录（EHR）数据的可获得性和可访问性日益增加，使其成为开展比较有效性研究的丰富的二级来源。为了进行这样的研究，许多研究人员转向目标试验框架（TTF）来模拟假设的随机临床试验。这种模拟的质量部分取决于TTF的每个组件的数据的可用性和可访问性。然而，使用电子病历数据的一个首要挑战是，临床就诊记录等非结构化字段包含大量患者细节，但如果在研究过程中需要提取，则需要额外的步骤。自然语言处理（NLP）代表了一系列方法来帮助自动化这种提取，从更简单的基于规则的方法到可以处理复杂语言结构的机器学习和人工智能方法。接下来的讨论是关于NLP方法如何为研究人员增加信息和数据，这些研究人员希望通过TTF使用电子病历数据来模拟假设的临床试验来估计治疗效果。最后，我们对有兴趣使用NLP方法获取存储在电子病历自由文本中的数据的研究人员提出了建议，并对TTF中这些数据的质量和有效性提出了考虑。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the use of natural language processing to implement the target trial framework using unstructured data from the electronic health record

The increasing availability and accessibility of electronic health record (EHR) data has made it a rich secondary source to conduct comparative effectiveness studies. To perform such studies, many researchers are turning to the target trial framework (TTF) to emulate the hypothetical randomized clinical trial. The quality of this emulation depends, in part, on the availability and accessibility of data for each component of the TTF. Yet one overarching challenge with using EHR data is that unstructured fields, such as clinical encounter notes, contain copious details on the patient yet require additional steps to extract if needed in the conduct of the study. Natural language processing (NLP) represents a spectrum of methods to assist with automating this extraction, from simpler rule-based methods to machine learning and artificial intelligence approaches that can handle complex language structures. What follows is a discussion on how NLP methods can augment information and data for researchers looking to estimate a treatment effect using EHR data via the TTF to emulate the hypothetical clinical trial. We conclude with recommendations for researchers interested in using NLP methods to obtain data stored in the free text of the EHR as well as considerations regarding the quality and validity of this data for the TTF.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Global Epidemiology Medicine-Infectious Diseases

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

39 days