A Deep-Learning-Enabled Workflow to Estimate Real-World Progression-Free Survival in Patients with Metastatic Breast Cancer: Study Using De-identified Electronic Health Records.
Gowtham Varma, Rohit Kumar Yenukoti, Praveen Kumar M, Bandlamudi Sai Ashrit, K Purushotham, C Subash, Sunil Kumar Ravi, Verghese Kurien, Avinash Aman, Mithun Manoharan, Shashank Jaiswal, Akash Anand, Rakesh Barve, Viswanathan Thiagarajan, Patrick Lenehan, Scott A Soefje, Venky Soundararajan
{"title":"A Deep-Learning-Enabled Workflow to Estimate Real-World Progression-Free Survival in Patients with Metastatic Breast Cancer: Study Using De-identified Electronic Health Records.","authors":"Gowtham Varma, Rohit Kumar Yenukoti, Praveen Kumar M, Bandlamudi Sai Ashrit, K Purushotham, C Subash, Sunil Kumar Ravi, Verghese Kurien, Avinash Aman, Mithun Manoharan, Shashank Jaiswal, Akash Anand, Rakesh Barve, Viswanathan Thiagarajan, Patrick Lenehan, Scott A Soefje, Venky Soundararajan","doi":"10.2196/64697","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Progression-free survival (PFS) is a crucial endpoint in cancer drug research. The clinician-confirmed cancer progression, namely real-world PFS (rwPFS) in unstructured text (i.e. clinical notes) has been shown to serve as a reasonable surrogate for real-world indicators in ascertaining progression endpoints. Response Evaluation Criteria in Solid Tumors(RECIST) is traditionally used in clinical trials using serial imaging evaluations, which is not practical when working with real-world data. Manual abstraction of clinical progression from unstructured notes continues to be the gold standard. However, this process is a resource-intensive and time-consuming process. Natural Language processing(NLP), a subdomain of machine learning, has shown promise in accelerating the extraction of tumor progression from real world data in recent years.</p><p><strong>Objective: </strong>We aim to configure a pre-trained, general-purpose healthcare NLP framework to transform free-text clinical notes and radiology reports into structured progression events for studying rwPFS on metastatic breast cancer (mBC) cohorts.</p><p><strong>Methods: </strong>This study developed and validated a novel semi-automated workflow to estimate rwPFS in patients with mBC using de-identified EHR data from the nference nSights platform. The developed workflow was validated in a cohort of 316 patients with hormone receptor-positive, human epidermal growth factor receptor 2(HER2)2-negative mBC, who were started on Palbociclib and Letrozole combination therapy between January 2015 and December 2021. Ground-truth datasets were curated to evaluate the workflow's performance at both the sentence and patient levels. NLP-captured progression or a change in therapy line were considered outcome events, while death, loss to follow-up, and end of study period were considered censoring events for rwPFS computation. Peak reduction and cumulative decline in Patient-Health-Questoinnaire-8(PHQ-8) scores were analyzed in the progressed and non-progressed patient subgroups.</p><p><strong>Results: </strong>The configured clinical NLP engine achieved a sentence-level progression capture accuracy of 98.2%. At the patient level, initial progression was captured within ±30 days with 88% accuracy. The median real-world progression-free survival (rwPFS) for the study cohort(N=316) was 20 months (95% CI: 18.0-25.0). In a validation subset(N=100), rwPFS determined by manual curation was 25 months (95% CI: 15-35 months), closely aligning with the computational workflow's 22 months (95% CI: 15-35 months). A sub-analysis revealed rwPFS estimates of 30 months (95% CI: 24.0-39.0) from radiology reports and 23 months (95% CI: 19.0-28.0) from clinical notes, highlighting the importance of integrating multiple note sources. External validation also demonstrated high accuracy (92.5%-sentence-level; 90.2%-patient-level). Sensitivity analysis revealed stable rwPFS estimates across varying levels of missing source data and event definitions. Peak reduction and cumulative decline in PHQ-8 scores during the study period highlighted significant associations between patient-reported outcomes and disease progression.</p><p><strong>Conclusions: </strong>This workflow enables rapid and reliable determination of rwPFS in mBC patients receiving combination therapy. Further validation across more diverse external datasets and other cancer types is needed to ensure broader applicability and generalizability.</p><p><strong>Clinicaltrial: </strong></p>","PeriodicalId":45538,"journal":{"name":"JMIR Cancer","volume":" ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Cancer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/64697","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Progression-free survival (PFS) is a crucial endpoint in cancer drug research. The clinician-confirmed cancer progression, namely real-world PFS (rwPFS) in unstructured text (i.e. clinical notes) has been shown to serve as a reasonable surrogate for real-world indicators in ascertaining progression endpoints. Response Evaluation Criteria in Solid Tumors(RECIST) is traditionally used in clinical trials using serial imaging evaluations, which is not practical when working with real-world data. Manual abstraction of clinical progression from unstructured notes continues to be the gold standard. However, this process is a resource-intensive and time-consuming process. Natural Language processing(NLP), a subdomain of machine learning, has shown promise in accelerating the extraction of tumor progression from real world data in recent years.
Objective: We aim to configure a pre-trained, general-purpose healthcare NLP framework to transform free-text clinical notes and radiology reports into structured progression events for studying rwPFS on metastatic breast cancer (mBC) cohorts.
Methods: This study developed and validated a novel semi-automated workflow to estimate rwPFS in patients with mBC using de-identified EHR data from the nference nSights platform. The developed workflow was validated in a cohort of 316 patients with hormone receptor-positive, human epidermal growth factor receptor 2(HER2)2-negative mBC, who were started on Palbociclib and Letrozole combination therapy between January 2015 and December 2021. Ground-truth datasets were curated to evaluate the workflow's performance at both the sentence and patient levels. NLP-captured progression or a change in therapy line were considered outcome events, while death, loss to follow-up, and end of study period were considered censoring events for rwPFS computation. Peak reduction and cumulative decline in Patient-Health-Questoinnaire-8(PHQ-8) scores were analyzed in the progressed and non-progressed patient subgroups.
Results: The configured clinical NLP engine achieved a sentence-level progression capture accuracy of 98.2%. At the patient level, initial progression was captured within ±30 days with 88% accuracy. The median real-world progression-free survival (rwPFS) for the study cohort(N=316) was 20 months (95% CI: 18.0-25.0). In a validation subset(N=100), rwPFS determined by manual curation was 25 months (95% CI: 15-35 months), closely aligning with the computational workflow's 22 months (95% CI: 15-35 months). A sub-analysis revealed rwPFS estimates of 30 months (95% CI: 24.0-39.0) from radiology reports and 23 months (95% CI: 19.0-28.0) from clinical notes, highlighting the importance of integrating multiple note sources. External validation also demonstrated high accuracy (92.5%-sentence-level; 90.2%-patient-level). Sensitivity analysis revealed stable rwPFS estimates across varying levels of missing source data and event definitions. Peak reduction and cumulative decline in PHQ-8 scores during the study period highlighted significant associations between patient-reported outcomes and disease progression.
Conclusions: This workflow enables rapid and reliable determination of rwPFS in mBC patients receiving combination therapy. Further validation across more diverse external datasets and other cancer types is needed to ensure broader applicability and generalizability.