Impact of retraining and data partitions on the generalizability of a deep learning model in the task of COVID-19 classification on chest radiographs.

IF 1.9 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Medical Imaging Pub Date : 2024-11-01 Epub Date: 2024-12-26 DOI:10.1117/1.JMI.11.6.064503

Mena Shenouda, Heather M Whitney, Maryellen L Giger, Samuel G Armato

{"title":"Impact of retraining and data partitions on the generalizability of a deep learning model in the task of COVID-19 classification on chest radiographs.","authors":"Mena Shenouda, Heather M Whitney, Maryellen L Giger, Samuel G Armato","doi":"10.1117/1.JMI.11.6.064503","DOIUrl":null,"url":null,"abstract":"Purpose: This study aimed to investigate the impact of different model retraining schemes and data partitioning on model performance in the task of COVID-19 classification on standard chest radiographs (CXRs), in the context of model generalizability.Approach: Two datasets from the same institution were used: Set A (9860 patients, collected from 02/20/2020 to 02/03/2021) and Set B (5893 patients, collected from 03/15/2020 to 01/01/2022). An original deep learning (DL) model trained and tested in the task of COVID-19 classification using the initial partition of Set A achieved an area under the curve (AUC) value of 0.76, whereas Set B yielded a significantly lower value of 0.67. To explore this discrepancy, four separate strategies were undertaken on the original model: (1) retrain using Set B, (2) fine-tune using Set B, (3) <math><mrow><mi>L</mi> <mn>2</mn></mrow> </math> regularization, and (4) repartition of the training set from Set A 200 times and report AUC values.Results: The model achieved the following AUC values (95% confidence interval) for the four methods: (1) 0.61 [0.56, 0.66]; (2) 0.70 [0.66, 0.73], both on Set B; (3) 0.76 [0.72, 0.79] on the initial test partition of Set A and 0.68 [0.66, 0.70] on Set B; and (4) <math><mrow><mn>0.71</mn> <mo>±</mo> <mn>0.013</mn></mrow> </math> on repartitions of Set A. The lowest AUC value (0.66 [0.62, 0.69]) of the Set A repartitions was no longer significantly different from the initial 0.67 achieved on Set B.Conclusions: Different data repartitions of the same dataset used to train a DL model demonstrated significantly different performance values that helped explain the discrepancy between Set A and Set B and further demonstrated the limitations of model generalizability.","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 6","pages":"064503"},"PeriodicalIF":1.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670362/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.11.6.064503","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/12/26 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: This study aimed to investigate the impact of different model retraining schemes and data partitioning on model performance in the task of COVID-19 classification on standard chest radiographs (CXRs), in the context of model generalizability.

Approach: Two datasets from the same institution were used: Set A (9860 patients, collected from 02/20/2020 to 02/03/2021) and Set B (5893 patients, collected from 03/15/2020 to 01/01/2022). An original deep learning (DL) model trained and tested in the task of COVID-19 classification using the initial partition of Set A achieved an area under the curve (AUC) value of 0.76, whereas Set B yielded a significantly lower value of 0.67. To explore this discrepancy, four separate strategies were undertaken on the original model: (1) retrain using Set B, (2) fine-tune using Set B, (3) $L 2$ regularization, and (4) repartition of the training set from Set A 200 times and report AUC values.

Results: The model achieved the following AUC values (95% confidence interval) for the four methods: (1) 0.61 [0.56, 0.66]; (2) 0.70 [0.66, 0.73], both on Set B; (3) 0.76 [0.72, 0.79] on the initial test partition of Set A and 0.68 [0.66, 0.70] on Set B; and (4) $0.71 \pm 0.013$ on repartitions of Set A. The lowest AUC value (0.66 [0.62, 0.69]) of the Set A repartitions was no longer significantly different from the initial 0.67 achieved on Set B.

Conclusions: Different data repartitions of the same dataset used to train a DL model demonstrated significantly different performance values that helped explain the discrepancy between Set A and Set B and further demonstrated the limitations of model generalizability.

查看原文本刊更多论文

再训练和数据分割对深度学习模型在胸片COVID-19分类任务中的泛化性的影响

目的：本研究旨在研究不同的模型再训练方案和数据分区对模型在标准胸片（CXR）COVID-19分类任务中的表现的影响，并探讨模型的可推广性：方法：使用来自同一机构的两个数据集：方法：使用来自同一机构的两个数据集：数据集 A（9860 名患者，收集时间为 2020 年 2 月 20 日至 2021 年 3 月 2 日）和数据集 B（5893 名患者，收集时间为 2020 年 3 月 15 日至 2022 年 1 月 1 日）。在 COVID-19 分类任务中，使用集合 A 的初始分区训练和测试的原始深度学习 (DL) 模型的曲线下面积 (AUC) 值为 0.76，而集合 B 的曲线下面积 (AUC) 值明显较低，为 0.67。为了探究这一差异，对原始模型分别采取了四种策略：(1) 使用集合 B 重新训练；(2) 使用集合 B 进行微调；(3) L 2 正则化；(4) 将集合 A 的训练集重新划分 200 次并报告 AUC 值：该模型在四种方法中取得了以下 AUC 值（95% 置信区间）：(1) 0.61 [0.56, 0.66]；(2) 0.70 [0.66, 0.73]，均在集合 B 上；(3) 在集合 A 的初始测试分区上取得了 0.76 [0.72, 0.79]，在集合 A 的初始测试分区上取得了 0.68 [0.集合 A 重新分区的最低 AUC 值（0.66 [0.62, 0.69]）与集合 B 初始测试分区的 0.67 已无显著差异：结论：用于训练 DL 模型的同一数据集的不同数据分区显示出明显不同的性能值，这有助于解释集合 A 和集合 B 之间的差异，并进一步证明了模型通用性的局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

4.10

自引率

4.20%

发文量

期刊介绍： JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.