External Test of a Deep Learning Algorithm for Pulmonary Nodule Malignancy Risk Stratification Using European Screening Data.

IF 15.2 1区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Radiology Pub Date : 2025-09-01 DOI:10.1148/radiol.250874

Noa Antonissen, Kiran Vaidhya Venkadesh, Renate Dinnessen, Ernst Th Scholten, Zaigham Saghir, Mario Silva, Ugo Pastorino, Grigory Sidorenkov, Marjolein A Heuvelmans, Geertruida H de Bock, Firdaus A A Mohamed Hoesein, Pim A de Jong, Harry J M Groen, Rozemarijn Vliegenthart, Hester A Gietema, Mathias Prokop, Cornelia Schaefer-Prokop, Colin Jacobs

{"title":"External Test of a Deep Learning Algorithm for Pulmonary Nodule Malignancy Risk Stratification Using European Screening Data.","authors":"Noa Antonissen, Kiran Vaidhya Venkadesh, Renate Dinnessen, Ernst Th Scholten, Zaigham Saghir, Mario Silva, Ugo Pastorino, Grigory Sidorenkov, Marjolein A Heuvelmans, Geertruida H de Bock, Firdaus A A Mohamed Hoesein, Pim A de Jong, Harry J M Groen, Rozemarijn Vliegenthart, Hester A Gietema, Mathias Prokop, Cornelia Schaefer-Prokop, Colin Jacobs","doi":"10.1148/radiol.250874","DOIUrl":null,"url":null,"abstract":"Background Low-dose CT screening reduces lung cancer-related deaths but has high rates of false-positive findings. A deep learning (DL) algorithm could improve nodule risk stratification but requires robust external testing. Purpose To externally test a DL algorithm for nodule malignancy risk estimation using pooled data from three large European lung cancer screening trials. Materials and Methods In this retrospective study, a DL algorithm trained on National Lung Screening Trial data was externally tested using baseline CT scans from the Danish Lung Cancer Screening Trial, the Multicentric Italian Lung Detection trial, and the Dutch-Belgian Lung Cancer Screening Trial. Performance was assessed across the pooled cohort and two subsets: subset A, including indeterminate nodules (5-15 mm); and subset B, including cancers size-matched to benign nodules (1:2 ratio). Performance, including the area under the receiver operating characteristic curve (AUC), was compared with the Pan-Canadian Early Detection of Lung Cancer (PanCan) model. Results The pooled cohort included 4146 participants (median age, 58 years; 78% male participants; median smoking history, 38 pack-years) with 7614 benign and 180 malignant nodules. The DL algorithm achieved AUCs of 0.98, 0.96, and 0.94 for cancers diagnosed within 1 year, 2 years, and throughout screening, respectively, compared with 0.98, 0.94, and 0.93 (P = .19, .02, and .46, respectively) for the PanCan model. In subset A (129 malignant and 2086 benign nodules), DL significantly outperformed PanCan across the same cancer diagnosis timeframes (respective AUCs: 0.95, 0.94, and 0.90 vs 0.91, 0.88, and 0.86; all P < .05). At 100% sensitivity for cancers diagnosed within 1 year, DL classified 68.1% of benign cases as low risk versus 47.4% for the PanCan model, a 39.4% relative reduction in false-positive findings. In subset B (180 malignant and 360 benign nodules), the AUC of the DL algorithm versus the PanCan model was 0.79 versus 0.60 (P < .01), respectively. Conclusion The DL algorithm outperformed the PanCan model across multiple European screening datasets, demonstrating superior malignancy prediction while substantially reducing false-positive classifications for indeterminate nodules. © RSNA, 2025 Supplemental material is available for this article.","PeriodicalId":20896,"journal":{"name":"Radiology","volume":"316 3","pages":"e250874"},"PeriodicalIF":15.2000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1148/radiol.250874","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background Low-dose CT screening reduces lung cancer-related deaths but has high rates of false-positive findings. A deep learning (DL) algorithm could improve nodule risk stratification but requires robust external testing. Purpose To externally test a DL algorithm for nodule malignancy risk estimation using pooled data from three large European lung cancer screening trials. Materials and Methods In this retrospective study, a DL algorithm trained on National Lung Screening Trial data was externally tested using baseline CT scans from the Danish Lung Cancer Screening Trial, the Multicentric Italian Lung Detection trial, and the Dutch-Belgian Lung Cancer Screening Trial. Performance was assessed across the pooled cohort and two subsets: subset A, including indeterminate nodules (5-15 mm); and subset B, including cancers size-matched to benign nodules (1:2 ratio). Performance, including the area under the receiver operating characteristic curve (AUC), was compared with the Pan-Canadian Early Detection of Lung Cancer (PanCan) model. Results The pooled cohort included 4146 participants (median age, 58 years; 78% male participants; median smoking history, 38 pack-years) with 7614 benign and 180 malignant nodules. The DL algorithm achieved AUCs of 0.98, 0.96, and 0.94 for cancers diagnosed within 1 year, 2 years, and throughout screening, respectively, compared with 0.98, 0.94, and 0.93 (P = .19, .02, and .46, respectively) for the PanCan model. In subset A (129 malignant and 2086 benign nodules), DL significantly outperformed PanCan across the same cancer diagnosis timeframes (respective AUCs: 0.95, 0.94, and 0.90 vs 0.91, 0.88, and 0.86; all P < .05). At 100% sensitivity for cancers diagnosed within 1 year, DL classified 68.1% of benign cases as low risk versus 47.4% for the PanCan model, a 39.4% relative reduction in false-positive findings. In subset B (180 malignant and 360 benign nodules), the AUC of the DL algorithm versus the PanCan model was 0.79 versus 0.60 (P < .01), respectively. Conclusion The DL algorithm outperformed the PanCan model across multiple European screening datasets, demonstrating superior malignancy prediction while substantially reducing false-positive classifications for indeterminate nodules. © RSNA, 2025 Supplemental material is available for this article.

查看原文本刊更多论文

使用欧洲筛查数据进行肺结节恶性肿瘤风险分层的深度学习算法的外部测试。

背景：低剂量CT筛查降低了肺癌相关的死亡率，但假阳性结果的发生率很高。深度学习（DL）算法可以改善结节风险分层，但需要强大的外部测试。目的利用三个大型欧洲肺癌筛查试验的汇总数据，对一种用于结节恶性肿瘤风险估计的DL算法进行外部测试。材料和方法在这项回顾性研究中，使用来自丹麦肺癌筛查试验、多中心意大利肺检测试验和荷兰-比利时肺癌筛查试验的基线CT扫描，对基于国家肺筛查试验数据训练的DL算法进行外部测试。对合并队列和两个子集的疗效进行评估：子集A，包括不确定结节（5-15 mm）；亚群B，包括肿瘤大小与良性结节匹配（1:2比例）。包括受试者工作特征曲线下面积（AUC）在内的性能与泛加拿大肺癌早期检测（PanCan）模型进行比较。结果纳入的队列包括4146名参与者（中位年龄58岁，78%为男性，中位吸烟史38包年），其中良性结节7614例，恶性结节180例。DL算法在1年、2年和整个筛查过程中诊断出的癌症的auc分别为0.98、0.96和0.94，而前者的auc分别为0.98、0.94和0.93 （P = 0.19）。02，和。（分别为46）。在A组（129个恶性结节和2086个良性结节）中，DL在相同的癌症诊断时间内明显优于PanCan（各自的auc: 0.95、0.94和0.90 vs 0.91、0.88和0.86；均P < 0.05）。在1年内诊断癌症的100%敏感性下，DL将68.1%的良性病例分类为低风险，而PanCan模型为47.4%，假阳性结果相对减少39.4%。在B子集（180个恶性结节和360个良性结节）中，DL算法与PanCan模型的AUC分别为0.79和0.60 （P < 0.01）。结论DL算法在多个欧洲筛选数据集上优于PanCan模型，显示出优越的恶性肿瘤预测能力，同时大大减少了对不确定结节的假阳性分类。©RSNA， 2025本文可获得补充材料。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Radiology 医学-核医学

CiteScore

35.20

自引率

3.00%

发文量

596

审稿时长

3.6 months

期刊介绍： Published regularly since 1923 by the Radiological Society of North America (RSNA), Radiology has long been recognized as the authoritative reference for the most current, clinically relevant and highest quality research in the field of radiology. Each month the journal publishes approximately 240 pages of peer-reviewed original research, authoritative reviews, well-balanced commentary on significant articles, and expert opinion on new techniques and technologies. Radiology publishes cutting edge and impactful imaging research articles in radiology and medical imaging in order to help improve human health.