解码早期和局部区域晚期非小细胞肺癌的复发:来自电子健康记录和自然语言处理的见解。

IF 3.3 Q2 ONCOLOGY
JCO Clinical Cancer Informatics Pub Date : 2025-04-01 Epub Date: 2025-04-18 DOI:10.1200/CCI-24-00227
Kyeryoung Lee, Zongzhi Liu, Qing Huang, David Corrigan, Iftekhar Kalsekar, Tomi Jun, Gustavo Stolovitzky, William K Oh, Ravi Rajaram, Xiaoyan Wang
{"title":"解码早期和局部区域晚期非小细胞肺癌的复发:来自电子健康记录和自然语言处理的见解。","authors":"Kyeryoung Lee, Zongzhi Liu, Qing Huang, David Corrigan, Iftekhar Kalsekar, Tomi Jun, Gustavo Stolovitzky, William K Oh, Ravi Rajaram, Xiaoyan Wang","doi":"10.1200/CCI-24-00227","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Recurrences after curative resection in early-stage and locoregionally advanced non-small cell lung cancer (NSCLC) are common, necessitating a nuanced understanding of associated risk factors. This study aimed to establish a natural language processing (NLP) system to efficiently curate recurrence data in NSCLC and analyze risk factors longitudinally.</p><p><strong>Patients and methods: </strong>Electronic health records of 6,351 patients with NSCLC with >700,000 notes were obtained from Mount Sinai's data sets. A deep learning-based customized NLP system was developed to identify cohorts experiencing recurrence. Recurrence types and rates over time were stratified by various clinical features. Cohort description analysis, Kaplan-Meier analysis for overall recurrence-free survival (RFS) and distant metastasis-free survival (DMFS), and Cox proportional hazards analysis were performed.</p><p><strong>Results: </strong>Of 1,295 patients with stage I-IIIA NSCLC with surgical resections, 336 patients (25.9%) experienced recurrence, as identified through NLP. The NLP system achieved a precision of 94.3%, a recall of 93%, and an F1 score of 93.5. Among 336 patients, 52.4% had local/regional recurrences, 44% distant metastases, and 3.6% unknown recurrence. RFS rates at years 1-5 were 93%, 81%, 73%, 67%, and 61%, respectively (96%, 89%, 84%, 80%, and 75% for distant metastasis). Stage-specific RFS rates at year 5 were 73% (IA), 62% (IB), 47% (IIA), 46% (IIB), and 20% (IIIA). Stage IB patients had a significantly higher likelihood of recurrence versus stage IA (adjusted hazard ratio [aHR], 1.63; <i>P</i> = .02). The RFS was lower in patients with clinically significant <i>TP53</i> alteration (<i>v</i> <i>TP53</i>-negative or unknown significance), affecting overall RFS (aHR, 1.89; <i>P</i> = .007) and DMFS (aHR, 2.47; <i>P</i> = .009) among stage IA/IB patients.</p><p><strong>Conclusion: </strong>Our scalable NLP system enabled us to generate real-world insights into NSCLC recurrences, paving the way for predictive models for preventing, diagnosing, and treating NSCLC recurrence.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400227"},"PeriodicalIF":3.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12011440/pdf/","citationCount":"0","resultStr":"{\"title\":\"Decoding Recurrence in Early-Stage and Locoregionally Advanced Non-Small Cell Lung Cancer: Insights From Electronic Health Records and Natural Language Processing.\",\"authors\":\"Kyeryoung Lee, Zongzhi Liu, Qing Huang, David Corrigan, Iftekhar Kalsekar, Tomi Jun, Gustavo Stolovitzky, William K Oh, Ravi Rajaram, Xiaoyan Wang\",\"doi\":\"10.1200/CCI-24-00227\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Recurrences after curative resection in early-stage and locoregionally advanced non-small cell lung cancer (NSCLC) are common, necessitating a nuanced understanding of associated risk factors. This study aimed to establish a natural language processing (NLP) system to efficiently curate recurrence data in NSCLC and analyze risk factors longitudinally.</p><p><strong>Patients and methods: </strong>Electronic health records of 6,351 patients with NSCLC with >700,000 notes were obtained from Mount Sinai's data sets. A deep learning-based customized NLP system was developed to identify cohorts experiencing recurrence. Recurrence types and rates over time were stratified by various clinical features. Cohort description analysis, Kaplan-Meier analysis for overall recurrence-free survival (RFS) and distant metastasis-free survival (DMFS), and Cox proportional hazards analysis were performed.</p><p><strong>Results: </strong>Of 1,295 patients with stage I-IIIA NSCLC with surgical resections, 336 patients (25.9%) experienced recurrence, as identified through NLP. The NLP system achieved a precision of 94.3%, a recall of 93%, and an F1 score of 93.5. Among 336 patients, 52.4% had local/regional recurrences, 44% distant metastases, and 3.6% unknown recurrence. RFS rates at years 1-5 were 93%, 81%, 73%, 67%, and 61%, respectively (96%, 89%, 84%, 80%, and 75% for distant metastasis). Stage-specific RFS rates at year 5 were 73% (IA), 62% (IB), 47% (IIA), 46% (IIB), and 20% (IIIA). Stage IB patients had a significantly higher likelihood of recurrence versus stage IA (adjusted hazard ratio [aHR], 1.63; <i>P</i> = .02). The RFS was lower in patients with clinically significant <i>TP53</i> alteration (<i>v</i> <i>TP53</i>-negative or unknown significance), affecting overall RFS (aHR, 1.89; <i>P</i> = .007) and DMFS (aHR, 2.47; <i>P</i> = .009) among stage IA/IB patients.</p><p><strong>Conclusion: </strong>Our scalable NLP system enabled us to generate real-world insights into NSCLC recurrences, paving the way for predictive models for preventing, diagnosing, and treating NSCLC recurrence.</p>\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":\"9 \",\"pages\":\"e2400227\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12011440/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI-24-00227\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/4/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI-24-00227","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/18 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

目的:早期和局部晚期非小细胞肺癌(NSCLC)根治性切除后复发是常见的,需要对相关危险因素进行细致入微的了解。本研究旨在建立一个自然语言处理(NLP)系统,以有效地整理NSCLC复发数据并纵向分析危险因素。患者和方法:从西奈山医院的数据集中获得了6351例非小细胞肺癌患者的电子健康记录,记录数为70000条。开发了基于深度学习的定制NLP系统来识别复发的队列。根据不同的临床特征对复发类型和复发率进行分层。进行队列描述分析、Kaplan-Meier分析总无复发生存期(RFS)和远端无转移生存期(DMFS),以及Cox比例风险分析。结果:1295例手术切除的I-IIIA期NSCLC患者中,336例(25.9%)通过NLP确诊复发。NLP系统的准确率为94.3%,召回率为93%,F1得分为93.5。在336例患者中,52.4%局部/区域复发,44%远处转移,3.6%未知复发。1-5年的RFS率分别为93%、81%、73%、67%和61%(远处转移的RFS率分别为96%、89%、84%、80%和75%)。第5年的分期特异性RFS率分别为73% (IA)、62% (IB)、47% (IIA)、46% (IIB)和20% (IIIA)。IB期患者复发的可能性明显高于IA期(校正风险比[aHR], 1.63;P = .02)。临床显著TP53改变(v TP53阴性或未知意义)患者的RFS较低,影响总体RFS (aHR, 1.89;P = .007)和DMFS (aHR, 2.47;P = 0.009)。结论:我们的可扩展NLP系统使我们能够产生对非小细胞肺癌复发的真实见解,为预防、诊断和治疗非小细胞肺癌复发的预测模型铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Decoding Recurrence in Early-Stage and Locoregionally Advanced Non-Small Cell Lung Cancer: Insights From Electronic Health Records and Natural Language Processing.

Purpose: Recurrences after curative resection in early-stage and locoregionally advanced non-small cell lung cancer (NSCLC) are common, necessitating a nuanced understanding of associated risk factors. This study aimed to establish a natural language processing (NLP) system to efficiently curate recurrence data in NSCLC and analyze risk factors longitudinally.

Patients and methods: Electronic health records of 6,351 patients with NSCLC with >700,000 notes were obtained from Mount Sinai's data sets. A deep learning-based customized NLP system was developed to identify cohorts experiencing recurrence. Recurrence types and rates over time were stratified by various clinical features. Cohort description analysis, Kaplan-Meier analysis for overall recurrence-free survival (RFS) and distant metastasis-free survival (DMFS), and Cox proportional hazards analysis were performed.

Results: Of 1,295 patients with stage I-IIIA NSCLC with surgical resections, 336 patients (25.9%) experienced recurrence, as identified through NLP. The NLP system achieved a precision of 94.3%, a recall of 93%, and an F1 score of 93.5. Among 336 patients, 52.4% had local/regional recurrences, 44% distant metastases, and 3.6% unknown recurrence. RFS rates at years 1-5 were 93%, 81%, 73%, 67%, and 61%, respectively (96%, 89%, 84%, 80%, and 75% for distant metastasis). Stage-specific RFS rates at year 5 were 73% (IA), 62% (IB), 47% (IIA), 46% (IIB), and 20% (IIIA). Stage IB patients had a significantly higher likelihood of recurrence versus stage IA (adjusted hazard ratio [aHR], 1.63; P = .02). The RFS was lower in patients with clinically significant TP53 alteration (v TP53-negative or unknown significance), affecting overall RFS (aHR, 1.89; P = .007) and DMFS (aHR, 2.47; P = .009) among stage IA/IB patients.

Conclusion: Our scalable NLP system enabled us to generate real-world insights into NSCLC recurrences, paving the way for predictive models for preventing, diagnosing, and treating NSCLC recurrence.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.20
自引率
4.80%
发文量
190
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信