Development and Portability of a Text Mining Algorithm for Capturing Disease Progression in Electronic Health Records of Patients With Stage IV Non-Small Cell Lung Cancer.

IF 3.3 Q2 ONCOLOGY
JCO Clinical Cancer Informatics Pub Date : 2024-10-01 Epub Date: 2024-10-04 DOI:10.1200/CCI.24.00053
M V Verschueren, H Abedian Kalkhoran, M Deenen, B E E M van den Borne, J Zwaveling, L E Visser, L T Bloem, B J M Peters, E M W van de Garde
{"title":"Development and Portability of a Text Mining Algorithm for Capturing Disease Progression in Electronic Health Records of Patients With Stage IV Non-Small Cell Lung Cancer.","authors":"M V Verschueren, H Abedian Kalkhoran, M Deenen, B E E M van den Borne, J Zwaveling, L E Visser, L T Bloem, B J M Peters, E M W van de Garde","doi":"10.1200/CCI.24.00053","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The objective was to develop and evaluate the portability of a text mining algorithm for prospectively capturing disease progression in electronic health record (EHR) data of patients with metastatic non-small cell lung cancer (mNSCLC) treated with immunochemotherapy.</p><p><strong>Methods: </strong>This study used EHR data from patients with mNSCLC receiving immunochemotherapy (between October 1, 2018, and December 31, 2022) in four Dutch hospitals. A text mining algorithm for capturing disease progression was developed in hospitals 1 and 2 and then transferred to hospitals 3 and 4 to evaluate portability. Performance metrics were calculated by comparing its outcomes with manual chart review. In addition, data were simulated to come available over time to assess performance in real-time applications. Median progression-free survival (PFS) was calculated using the Kaplan-Meier method to compare text mining with manual chart review.</p><p><strong>Results: </strong>During development and portability, the text mining algorithm performed well in capturing disease progression, with all performance scores >90%. When real-time performance was simulated, the performance scores in all four hospitals exceeded 90% from week 15 after the start of follow-up. Although the exact progression dates varied in 46 patients of 157 patients with progressive disease, the number of patients labeled with progression too early (n = 24) and too late (n = 22) was well balanced with discrepancies ranging from -116 to 384 days. Nevertheless, the PFS curves constructed with text mining and manual chart review were highly similar for each hospital.</p><p><strong>Conclusion: </strong>In this study, an accurate text mining algorithm for capturing disease progression in the EHR data of patients with mNSCLC was developed. The algorithm was portable across different hospitals, and the performance over time was good, making this an interesting approach for prospective follow-up of multicenter cohorts.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11469628/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.24.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/4 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose: The objective was to develop and evaluate the portability of a text mining algorithm for prospectively capturing disease progression in electronic health record (EHR) data of patients with metastatic non-small cell lung cancer (mNSCLC) treated with immunochemotherapy.

Methods: This study used EHR data from patients with mNSCLC receiving immunochemotherapy (between October 1, 2018, and December 31, 2022) in four Dutch hospitals. A text mining algorithm for capturing disease progression was developed in hospitals 1 and 2 and then transferred to hospitals 3 and 4 to evaluate portability. Performance metrics were calculated by comparing its outcomes with manual chart review. In addition, data were simulated to come available over time to assess performance in real-time applications. Median progression-free survival (PFS) was calculated using the Kaplan-Meier method to compare text mining with manual chart review.

Results: During development and portability, the text mining algorithm performed well in capturing disease progression, with all performance scores >90%. When real-time performance was simulated, the performance scores in all four hospitals exceeded 90% from week 15 after the start of follow-up. Although the exact progression dates varied in 46 patients of 157 patients with progressive disease, the number of patients labeled with progression too early (n = 24) and too late (n = 22) was well balanced with discrepancies ranging from -116 to 384 days. Nevertheless, the PFS curves constructed with text mining and manual chart review were highly similar for each hospital.

Conclusion: In this study, an accurate text mining algorithm for capturing disease progression in the EHR data of patients with mNSCLC was developed. The algorithm was portable across different hospitals, and the performance over time was good, making this an interesting approach for prospective follow-up of multicenter cohorts.

在 IV 期非小细胞肺癌患者电子健康记录中捕捉疾病进展的文本挖掘算法的开发与可移植性。
目的:本研究旨在开发和评估一种文本挖掘算法的可移植性,以前瞻性地捕捉接受免疫化疗的转移性非小细胞肺癌(mNSCLC)患者电子健康记录(EHR)数据中的疾病进展情况:本研究使用了四家荷兰医院接受免疫化疗的mNSCLC患者的电子病历数据(2018年10月1日至2022年12月31日期间)。在1号和2号医院开发了一种用于捕捉疾病进展的文本挖掘算法,然后将其转移到3号和4号医院,以评估其可移植性。通过将其结果与人工病历审查进行比较,计算出性能指标。此外,还模拟了数据随时间推移的可用性,以评估实时应用的性能。使用 Kaplan-Meier 法计算无进展生存期(PFS)中位数,以比较文本挖掘与人工病历审查的结果:结果:在开发和移植过程中,文本挖掘算法在捕捉疾病进展方面表现良好,所有性能得分均大于 90%。在模拟实时性能时,从随访开始后的第 15 周起,所有四家医院的性能得分都超过了 90%。虽然在 157 例疾病进展患者中,有 46 例患者的确切进展日期不尽相同,但标记为进展过早(24 例)和过晚(22 例)的患者数量非常均衡,差异范围从-116 天到 384 天不等。尽管如此,每家医院通过文本挖掘和人工病历审查构建的 PFS 曲线高度相似:本研究开发了一种准确的文本挖掘算法,用于捕捉 mNSCLC 患者电子病历数据中的疾病进展情况。该算法可在不同医院间移植,且随时间推移性能良好,因此是一种用于多中心队列前瞻性随访的有趣方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.20
自引率
4.80%
发文量
190
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信