Aparajita Khan, Eunji Choi, Chloe Su, Anna Graber-Naidich, Solomon Henry, Mina L Satoyoshi, Archana Bhat, Allison W Kurian, Su-Ying Liang, Joel Neal, Michael Gould, Ann Leung, Heather A Wakelee, Leah M Backhus, Curtis Langlotz, Julie Wu, Summer S Han
{"title":"使用自然语言处理自动提取计算机断层成像指征以评估长期肺癌幸存者的监测模式。","authors":"Aparajita Khan, Eunji Choi, Chloe Su, Anna Graber-Naidich, Solomon Henry, Mina L Satoyoshi, Archana Bhat, Allison W Kurian, Su-Ying Liang, Joel Neal, Michael Gould, Ann Leung, Heather A Wakelee, Leah M Backhus, Curtis Langlotz, Julie Wu, Summer S Han","doi":"10.1200/CCI-24-00279","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Despite its routine use to monitor patients with lung cancer (LC), real-world evaluations of the impact of computed tomography (CT) surveillance on overall survival (OS) have been inconsistent. A major confounder is the absence of imaging indications because patients undergo CT scans for purposes beyond surveillance, like symptom evaluations (eg, cough) linked to poor survival. We propose a novel natural language processing model to predict CT imaging indications (surveillance <i>v</i> others).</p><p><strong>Methods: </strong>We used electronic health records of 585 long-term LC survivors (≥5 years) at Stanford, followed for up to 22 years. Their 3,362 post-5-year CT reports (including 1,672 manually annotated) were used for modeling by integrating structured variables (eg, CT intervals) with key-phrase analysis of radiology reports. Naïve analysis compared OS in patients with CT for any indications (including symptoms) versus those without post-5-year CT, as in previous studies. Using model-predicted indications, we conducted exploratory analyses to compare OS between those with post-5-year surveillance CT and those without.</p><p><strong>Results: </strong>The model showed high discrimination (AUC, 0.86), with key predictors including a longer interval (≥6-month) from the previous CT (odds ratios [OR], 5.50; <i>P</i> < .001) and surveillance-related key phrases (OR, 1.37; <i>P</i> = .03). Propensity-adjusted survival analysis indicated better OS for patients with any post-5-year surveillance CT versus those without (adjusted hazard ratio, 0.60; <i>P</i> = .016). By contrast, no significant survival difference was found (<i>P</i> = .53) between patients with any CT versus those without post-5-year CT.</p><p><strong>Conclusion: </strong>Our model abstracted CT indications from real-world data with high discrimination. Exploratory analyses revealed the obscured imaging-OS association when considering indications, highlighting the model's potential for future real-world studies.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":"9 ","pages":"e2400279"},"PeriodicalIF":2.8000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309515/pdf/","citationCount":"0","resultStr":"{\"title\":\"Automatic Abstraction of Computed Tomography Imaging Indication Using Natural Language Processing for Evaluation of Surveillance Patterns in Long-Term Lung Cancer Survivors.\",\"authors\":\"Aparajita Khan, Eunji Choi, Chloe Su, Anna Graber-Naidich, Solomon Henry, Mina L Satoyoshi, Archana Bhat, Allison W Kurian, Su-Ying Liang, Joel Neal, Michael Gould, Ann Leung, Heather A Wakelee, Leah M Backhus, Curtis Langlotz, Julie Wu, Summer S Han\",\"doi\":\"10.1200/CCI-24-00279\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Despite its routine use to monitor patients with lung cancer (LC), real-world evaluations of the impact of computed tomography (CT) surveillance on overall survival (OS) have been inconsistent. A major confounder is the absence of imaging indications because patients undergo CT scans for purposes beyond surveillance, like symptom evaluations (eg, cough) linked to poor survival. We propose a novel natural language processing model to predict CT imaging indications (surveillance <i>v</i> others).</p><p><strong>Methods: </strong>We used electronic health records of 585 long-term LC survivors (≥5 years) at Stanford, followed for up to 22 years. Their 3,362 post-5-year CT reports (including 1,672 manually annotated) were used for modeling by integrating structured variables (eg, CT intervals) with key-phrase analysis of radiology reports. Naïve analysis compared OS in patients with CT for any indications (including symptoms) versus those without post-5-year CT, as in previous studies. Using model-predicted indications, we conducted exploratory analyses to compare OS between those with post-5-year surveillance CT and those without.</p><p><strong>Results: </strong>The model showed high discrimination (AUC, 0.86), with key predictors including a longer interval (≥6-month) from the previous CT (odds ratios [OR], 5.50; <i>P</i> < .001) and surveillance-related key phrases (OR, 1.37; <i>P</i> = .03). Propensity-adjusted survival analysis indicated better OS for patients with any post-5-year surveillance CT versus those without (adjusted hazard ratio, 0.60; <i>P</i> = .016). By contrast, no significant survival difference was found (<i>P</i> = .53) between patients with any CT versus those without post-5-year CT.</p><p><strong>Conclusion: </strong>Our model abstracted CT indications from real-world data with high discrimination. Exploratory analyses revealed the obscured imaging-OS association when considering indications, highlighting the model's potential for future real-world studies.</p>\",\"PeriodicalId\":51626,\"journal\":{\"name\":\"JCO Clinical Cancer Informatics\",\"volume\":\"9 \",\"pages\":\"e2400279\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12309515/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"JCO Clinical Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1200/CCI-24-00279\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/23 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI-24-00279","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/23 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
Automatic Abstraction of Computed Tomography Imaging Indication Using Natural Language Processing for Evaluation of Surveillance Patterns in Long-Term Lung Cancer Survivors.
Purpose: Despite its routine use to monitor patients with lung cancer (LC), real-world evaluations of the impact of computed tomography (CT) surveillance on overall survival (OS) have been inconsistent. A major confounder is the absence of imaging indications because patients undergo CT scans for purposes beyond surveillance, like symptom evaluations (eg, cough) linked to poor survival. We propose a novel natural language processing model to predict CT imaging indications (surveillance v others).
Methods: We used electronic health records of 585 long-term LC survivors (≥5 years) at Stanford, followed for up to 22 years. Their 3,362 post-5-year CT reports (including 1,672 manually annotated) were used for modeling by integrating structured variables (eg, CT intervals) with key-phrase analysis of radiology reports. Naïve analysis compared OS in patients with CT for any indications (including symptoms) versus those without post-5-year CT, as in previous studies. Using model-predicted indications, we conducted exploratory analyses to compare OS between those with post-5-year surveillance CT and those without.
Results: The model showed high discrimination (AUC, 0.86), with key predictors including a longer interval (≥6-month) from the previous CT (odds ratios [OR], 5.50; P < .001) and surveillance-related key phrases (OR, 1.37; P = .03). Propensity-adjusted survival analysis indicated better OS for patients with any post-5-year surveillance CT versus those without (adjusted hazard ratio, 0.60; P = .016). By contrast, no significant survival difference was found (P = .53) between patients with any CT versus those without post-5-year CT.
Conclusion: Our model abstracted CT indications from real-world data with high discrimination. Exploratory analyses revealed the obscured imaging-OS association when considering indications, highlighting the model's potential for future real-world studies.