CheXternal: generalization of deep learning models for chest X-ray interpretation to photos of chest X-rays and external clinical settings

P. Rajpurkar, Anirudh Joshi, A. Pareek, A. Ng, M. Lungren
{"title":"CheXternal: generalization of deep learning models for chest X-ray interpretation to photos of chest X-rays and external clinical settings","authors":"P. Rajpurkar, Anirudh Joshi, A. Pareek, A. Ng, M. Lungren","doi":"10.1145/3450439.3451876","DOIUrl":null,"url":null,"abstract":"Recent advances in training deep learning models have demonstrated the potential to provide accurate chest X-ray interpretation and increase access to radiology expertise. However, poor generalization due to data distribution shifts in clinical settings is a key barrier to implementation. In this study, we measured the diagnostic performance for 8 different chest X-ray models when applied to (1) smartphone photos of chest X-rays and (2) external datasets without any finetuning. All models were developed by different groups and submitted to the CheXpert challenge, and re-applied to test datasets without further tuning. We found that (1) on photos of chest X-rays, all 8 models experienced a statistically significant drop in task performance, but only 3 performed significantly worse than radiologists on average, and (2) on the external set, none of the models performed statistically significantly worse than radiologists, and five models performed statistically significantly better than radiologists. Our results demonstrate that some chest X-ray models, under clinically relevant distribution shifts, were comparable to radiologists while other models were not. Future work should investigate aspects of model training procedures and dataset collection that influence generalization in the presence of data distribution shifts.","PeriodicalId":87342,"journal":{"name":"Proceedings of the ACM Conference on Health, Inference, and Learning","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Conference on Health, Inference, and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3450439.3451876","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Recent advances in training deep learning models have demonstrated the potential to provide accurate chest X-ray interpretation and increase access to radiology expertise. However, poor generalization due to data distribution shifts in clinical settings is a key barrier to implementation. In this study, we measured the diagnostic performance for 8 different chest X-ray models when applied to (1) smartphone photos of chest X-rays and (2) external datasets without any finetuning. All models were developed by different groups and submitted to the CheXpert challenge, and re-applied to test datasets without further tuning. We found that (1) on photos of chest X-rays, all 8 models experienced a statistically significant drop in task performance, but only 3 performed significantly worse than radiologists on average, and (2) on the external set, none of the models performed statistically significantly worse than radiologists, and five models performed statistically significantly better than radiologists. Our results demonstrate that some chest X-ray models, under clinically relevant distribution shifts, were comparable to radiologists while other models were not. Future work should investigate aspects of model training procedures and dataset collection that influence generalization in the presence of data distribution shifts.
CheXternal:将胸部x线解读的深度学习模型推广到胸部x线照片和外部临床设置
训练深度学习模型的最新进展已经证明了提供准确的胸部x射线解释和增加获得放射学专业知识的潜力。然而,由于临床环境中数据分布的变化而导致的泛化不良是实施的主要障碍。在这项研究中,我们测量了8种不同的胸部x线模型在应用于(1)智能手机的胸部x线照片和(2)没有任何微调的外部数据集时的诊断性能。所有模型都由不同的小组开发,并提交给CheXpert挑战,并在没有进一步调优的情况下重新应用于测试数据集。我们发现(1)在胸部x光照片上,所有8个模型的任务表现都有统计学上的显著下降,但只有3个模型的平均表现明显低于放射科医生;(2)在外部集合上,没有一个模型的任务表现在统计学上显著低于放射科医生,有5个模型的任务表现在统计学上显著优于放射科医生。我们的研究结果表明,在临床相关的分布变化下,一些胸部x线模型与放射科医生具有可比性,而其他模型则不然。未来的工作应该研究在数据分布变化的情况下影响泛化的模型训练程序和数据集收集方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信