胸部 X 射线中 COVID-19 检测深度学习模型的多中心基准测试

Rachael Harkness, A. F. Frangi, K. Zucker, Nishant Ravikumar
{"title":"胸部 X 射线中 COVID-19 检测深度学习模型的多中心基准测试","authors":"Rachael Harkness, A. F. Frangi, K. Zucker, Nishant Ravikumar","doi":"10.3389/fradi.2024.1386906","DOIUrl":null,"url":null,"abstract":"This study is a retrospective evaluation of the performance of deep learning models that were developed for the detection of COVID-19 from chest x-rays, undertaken with the goal of assessing the suitability of such systems as clinical decision support tools.Models were trained on the National COVID-19 Chest Imaging Database (NCCID), a UK-wide multi-centre dataset from 26 different NHS hospitals and evaluated on independent multi-national clinical datasets. The evaluation considers clinical and technical contributors to model error and potential model bias. Model predictions are examined for spurious feature correlations using techniques for explainable prediction.Models performed adequately on NHS populations, with performance comparable to radiologists, but generalised poorly to international populations. Models performed better in males than females, and performance varied across age groups. Alarmingly, models routinely failed when applied to complex clinical cases with confounding pathologies and when applied to radiologist defined “mild” cases.This comprehensive benchmarking study examines the pitfalls in current practices that have led to impractical model development. Key findings highlight the need for clinician involvement at all stages of model development, from data curation and label definition, to model evaluation, to ensure that all clinical factors and disease features are appropriately considered during model design. This is imperative to ensure automated approaches developed for disease detection are fit-for-purpose in a clinical setting.","PeriodicalId":73101,"journal":{"name":"Frontiers in radiology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-centre benchmarking of deep learning models for COVID-19 detection in chest x-rays\",\"authors\":\"Rachael Harkness, A. F. Frangi, K. Zucker, Nishant Ravikumar\",\"doi\":\"10.3389/fradi.2024.1386906\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study is a retrospective evaluation of the performance of deep learning models that were developed for the detection of COVID-19 from chest x-rays, undertaken with the goal of assessing the suitability of such systems as clinical decision support tools.Models were trained on the National COVID-19 Chest Imaging Database (NCCID), a UK-wide multi-centre dataset from 26 different NHS hospitals and evaluated on independent multi-national clinical datasets. The evaluation considers clinical and technical contributors to model error and potential model bias. Model predictions are examined for spurious feature correlations using techniques for explainable prediction.Models performed adequately on NHS populations, with performance comparable to radiologists, but generalised poorly to international populations. Models performed better in males than females, and performance varied across age groups. Alarmingly, models routinely failed when applied to complex clinical cases with confounding pathologies and when applied to radiologist defined “mild” cases.This comprehensive benchmarking study examines the pitfalls in current practices that have led to impractical model development. Key findings highlight the need for clinician involvement at all stages of model development, from data curation and label definition, to model evaluation, to ensure that all clinical factors and disease features are appropriately considered during model design. This is imperative to ensure automated approaches developed for disease detection are fit-for-purpose in a clinical setting.\",\"PeriodicalId\":73101,\"journal\":{\"name\":\"Frontiers in radiology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in radiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fradi.2024.1386906\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in radiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fradi.2024.1386906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本研究是对为检测胸部X光片中的COVID-19而开发的深度学习模型的性能进行的回顾性评估,目的是评估此类系统作为临床决策支持工具的适用性。模型在国家COVID-19胸部成像数据库(NCCID)上进行了训练,该数据库是英国范围内的多中心数据集,来自26家不同的国家医疗服务系统医院,并在独立的多国临床数据集上进行了评估。评估考虑了导致模型误差和潜在模型偏差的临床和技术因素。使用可解释预测技术检查了模型预测的虚假特征相关性。模型在英国国家医疗服务系统人群中的表现良好,与放射科医生的表现相当,但在国际人群中的普适性较差。模型在男性中的表现优于女性,在不同年龄组中的表现也不尽相同。令人担忧的是,当模型应用于具有混杂病理的复杂临床病例时,以及应用于放射科医生定义的 "轻度 "病例时,通常都会失败。这项综合基准研究探讨了当前实践中导致模型开发不切实际的陷阱。主要发现强调了临床医生参与模型开发各个阶段的必要性,从数据整理和标签定义到模型评估,以确保在模型设计过程中适当考虑所有临床因素和疾病特征。这对于确保为疾病检测开发的自动方法适合临床环境至关重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-centre benchmarking of deep learning models for COVID-19 detection in chest x-rays
This study is a retrospective evaluation of the performance of deep learning models that were developed for the detection of COVID-19 from chest x-rays, undertaken with the goal of assessing the suitability of such systems as clinical decision support tools.Models were trained on the National COVID-19 Chest Imaging Database (NCCID), a UK-wide multi-centre dataset from 26 different NHS hospitals and evaluated on independent multi-national clinical datasets. The evaluation considers clinical and technical contributors to model error and potential model bias. Model predictions are examined for spurious feature correlations using techniques for explainable prediction.Models performed adequately on NHS populations, with performance comparable to radiologists, but generalised poorly to international populations. Models performed better in males than females, and performance varied across age groups. Alarmingly, models routinely failed when applied to complex clinical cases with confounding pathologies and when applied to radiologist defined “mild” cases.This comprehensive benchmarking study examines the pitfalls in current practices that have led to impractical model development. Key findings highlight the need for clinician involvement at all stages of model development, from data curation and label definition, to model evaluation, to ensure that all clinical factors and disease features are appropriately considered during model design. This is imperative to ensure automated approaches developed for disease detection are fit-for-purpose in a clinical setting.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.20
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信