Nicole Rafalko , Milena Gianfrancesco , Neal D. Goldstein
{"title":"利用电子健康记录中的非结构化数据,利用自然语言处理实现目标试验框架","authors":"Nicole Rafalko , Milena Gianfrancesco , Neal D. Goldstein","doi":"10.1016/j.gloepi.2025.100204","DOIUrl":null,"url":null,"abstract":"<div><div>The increasing availability and accessibility of electronic health record (EHR) data has made it a rich secondary source to conduct comparative effectiveness studies. To perform such studies, many researchers are turning to the target trial framework (TTF) to emulate the hypothetical randomized clinical trial. The quality of this emulation depends, in part, on the availability and accessibility of data for each component of the TTF. Yet one overarching challenge with using EHR data is that unstructured fields, such as clinical encounter notes, contain copious details on the patient yet require additional steps to extract if needed in the conduct of the study. Natural language processing (NLP) represents a spectrum of methods to assist with automating this extraction, from simpler rule-based methods to machine learning and artificial intelligence approaches that can handle complex language structures. What follows is a discussion on how NLP methods can augment information and data for researchers looking to estimate a treatment effect using EHR data via the TTF to emulate the hypothetical clinical trial. We conclude with recommendations for researchers interested in using NLP methods to obtain data stored in the free text of the EHR as well as considerations regarding the quality and validity of this data for the TTF.</div></div>","PeriodicalId":36311,"journal":{"name":"Global Epidemiology","volume":"9 ","pages":"Article 100204"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the use of natural language processing to implement the target trial framework using unstructured data from the electronic health record\",\"authors\":\"Nicole Rafalko , Milena Gianfrancesco , Neal D. Goldstein\",\"doi\":\"10.1016/j.gloepi.2025.100204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The increasing availability and accessibility of electronic health record (EHR) data has made it a rich secondary source to conduct comparative effectiveness studies. To perform such studies, many researchers are turning to the target trial framework (TTF) to emulate the hypothetical randomized clinical trial. The quality of this emulation depends, in part, on the availability and accessibility of data for each component of the TTF. Yet one overarching challenge with using EHR data is that unstructured fields, such as clinical encounter notes, contain copious details on the patient yet require additional steps to extract if needed in the conduct of the study. Natural language processing (NLP) represents a spectrum of methods to assist with automating this extraction, from simpler rule-based methods to machine learning and artificial intelligence approaches that can handle complex language structures. What follows is a discussion on how NLP methods can augment information and data for researchers looking to estimate a treatment effect using EHR data via the TTF to emulate the hypothetical clinical trial. We conclude with recommendations for researchers interested in using NLP methods to obtain data stored in the free text of the EHR as well as considerations regarding the quality and validity of this data for the TTF.</div></div>\",\"PeriodicalId\":36311,\"journal\":{\"name\":\"Global Epidemiology\",\"volume\":\"9 \",\"pages\":\"Article 100204\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Global Epidemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590113325000227\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590113325000227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the use of natural language processing to implement the target trial framework using unstructured data from the electronic health record
The increasing availability and accessibility of electronic health record (EHR) data has made it a rich secondary source to conduct comparative effectiveness studies. To perform such studies, many researchers are turning to the target trial framework (TTF) to emulate the hypothetical randomized clinical trial. The quality of this emulation depends, in part, on the availability and accessibility of data for each component of the TTF. Yet one overarching challenge with using EHR data is that unstructured fields, such as clinical encounter notes, contain copious details on the patient yet require additional steps to extract if needed in the conduct of the study. Natural language processing (NLP) represents a spectrum of methods to assist with automating this extraction, from simpler rule-based methods to machine learning and artificial intelligence approaches that can handle complex language structures. What follows is a discussion on how NLP methods can augment information and data for researchers looking to estimate a treatment effect using EHR data via the TTF to emulate the hypothetical clinical trial. We conclude with recommendations for researchers interested in using NLP methods to obtain data stored in the free text of the EHR as well as considerations regarding the quality and validity of this data for the TTF.