Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study.

Gregory Holste, Song Wang, Ziyu Jiang, Thomas C Shen, George Shih, Ronald M Summers, Yifan Peng, Zhangyang Wang
{"title":"Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study.","authors":"Gregory Holste, Song Wang, Ziyu Jiang, Thomas C Shen, George Shih, Ronald M Summers, Yifan Peng, Zhangyang Wang","doi":"10.1007/978-3-031-17027-0_3","DOIUrl":null,"url":null,"abstract":"<p><p>Imaging exams, such as chest radiography, will yield a small set of common findings and a much larger set of uncommon findings. While a trained radiologist can learn the visual presentation of rare conditions by studying a few representative examples, teaching a machine to learn from such a \"long-tailed\" distribution is much more difficult, as standard methods would be easily biased toward the most frequent classes. In this paper, we present a comprehensive benchmark study of the long-tailed learning problem in the specific domain of thorax diseases on chest X-rays. We focus on learning from naturally distributed chest X-ray data, optimizing classification accuracy over not only the common \"head\" classes, but also the rare yet critical \"tail\" classes. To accomplish this, we introduce a challenging new long-tailed chest X-ray benchmark to facilitate research on developing long-tailed learning methods for medical image classification. The benchmark consists of two chest X-ray datasets for 19- and 20-way thorax disease classification, containing classes with as many as 53,000 and as few as 7 labeled training images. We evaluate both standard and state-of-the-art long-tailed learning methods on this new benchmark, analyzing which aspects of these methods are most beneficial for long-tailed medical image classification and summarizing insights for future algorithm design. The datasets, trained models, and code are available at https://github.com/VITA-Group/LongTailCXR.</p>","PeriodicalId":93741,"journal":{"name":"Data augmentation, labelling, and imperfections : second MICCAI workshop, DALI 2022, held in conjunction with MICCAI 2022, Singapore, September 22, 2022, proceedings. DALI (Workshop) (2nd : 2022 : Singapore)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9618235/pdf/nihms-1844023.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data augmentation, labelling, and imperfections : second MICCAI workshop, DALI 2022, held in conjunction with MICCAI 2022, Singapore, September 22, 2022, proceedings. DALI (Workshop) (2nd : 2022 : Singapore)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-3-031-17027-0_3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/9/16 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Imaging exams, such as chest radiography, will yield a small set of common findings and a much larger set of uncommon findings. While a trained radiologist can learn the visual presentation of rare conditions by studying a few representative examples, teaching a machine to learn from such a "long-tailed" distribution is much more difficult, as standard methods would be easily biased toward the most frequent classes. In this paper, we present a comprehensive benchmark study of the long-tailed learning problem in the specific domain of thorax diseases on chest X-rays. We focus on learning from naturally distributed chest X-ray data, optimizing classification accuracy over not only the common "head" classes, but also the rare yet critical "tail" classes. To accomplish this, we introduce a challenging new long-tailed chest X-ray benchmark to facilitate research on developing long-tailed learning methods for medical image classification. The benchmark consists of two chest X-ray datasets for 19- and 20-way thorax disease classification, containing classes with as many as 53,000 and as few as 7 labeled training images. We evaluate both standard and state-of-the-art long-tailed learning methods on this new benchmark, analyzing which aspects of these methods are most beneficial for long-tailed medical image classification and summarizing insights for future algorithm design. The datasets, trained models, and code are available at https://github.com/VITA-Group/LongTailCXR.

胸部 X 光片胸腔疾病的长尾分类:新基准研究
成像检查(如胸片)会产生一小部分常见的检查结果和一大部分不常见的检查结果。虽然训练有素的放射科医生可以通过研究一些有代表性的例子来学习罕见病症的视觉表现,但让机器从这种 "长尾 "分布中学习却要困难得多,因为标准方法很容易偏向最常见的类别。在本文中,我们针对胸部 X 光片上的胸部疾病这一特定领域的长尾学习问题进行了全面的基准研究。我们的研究重点是从自然分布的胸部 X 光数据中学习,不仅要优化常见 "头部 "类别的分类准确性,还要优化罕见但关键的 "尾部 "类别的分类准确性。为此,我们引入了一个具有挑战性的新长尾胸部 X 光基准,以促进医学图像分类长尾学习方法的开发研究。该基准由两个胸部 X 光数据集组成,分别用于 19 路和 20 路胸部疾病分类,包含多达 53,000 个类别和少至 7 个标记的训练图像。我们在这个新基准上评估了标准的和最先进的长尾学习方法,分析了这些方法的哪些方面最有利于长尾医学图像分类,并总结了对未来算法设计的启示。数据集、训练模型和代码可在 https://github.com/VITA-Group/LongTailCXR 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信