How do deep-learning models generalize across populations? Cross-ethnicity generalization of COPD detection.

IF 4.1 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Insights into Imaging Pub Date : 2024-08-07 DOI:10.1186/s13244-024-01781-x

Silvia D Almeida, Tobias Norajitra, Carsten T Lüth, Tassilo Wald, Vivienn Weru, Marco Nolden, Paul F Jäger, Oyunbileg von Stackelberg, Claus Peter Heußel, Oliver Weinheimer, Jürgen Biederer, Hans-Ulrich Kauczor, Klaus Maier-Hein

{"title":"How do deep-learning models generalize across populations? Cross-ethnicity generalization of COPD detection.","authors":"Silvia D Almeida, Tobias Norajitra, Carsten T Lüth, Tassilo Wald, Vivienn Weru, Marco Nolden, Paul F Jäger, Oyunbileg von Stackelberg, Claus Peter Heußel, Oliver Weinheimer, Jürgen Biederer, Hans-Ulrich Kauczor, Klaus Maier-Hein","doi":"10.1186/s13244-024-01781-x","DOIUrl":null,"url":null,"abstract":"Objectives: To evaluate the performance and potential biases of deep-learning models in detecting chronic obstructive pulmonary disease (COPD) on chest CT scans across different ethnic groups, specifically non-Hispanic White (NHW) and African American (AA) populations.Materials and methods: Inspiratory chest CT and clinical data from 7549 Genetic epidemiology of COPD individuals (mean age 62 years old, 56-69 interquartile range), including 5240 NHW and 2309 AA individuals, were retrospectively analyzed. Several factors influencing COPD binary classification performance on different ethnic populations were examined: (1) effects of training population: NHW-only, AA-only, balanced set (half NHW, half AA) and the entire set (NHW + AA all); (2) learning strategy: three supervised learning (SL) vs. three self-supervised learning (SSL) methods. Distribution shifts across ethnicity were further assessed for the top-performing methods.Results: The learning strategy significantly influenced model performance, with SSL methods achieving higher performances compared to SL methods (p < 0.001), across all training configurations. Training on balanced datasets containing NHW and AA individuals resulted in improved model performance compared to population-specific datasets. Distribution shifts were found between ethnicities for the same health status, particularly when models were trained on nearest-neighbor contrastive SSL. Training on a balanced dataset resulted in fewer distribution shifts across ethnicity and health status, highlighting its efficacy in reducing biases.Conclusion: Our findings demonstrate that utilizing SSL methods and training on large and balanced datasets can enhance COPD detection model performance and reduce biases across diverse ethnic populations. These findings emphasize the importance of equitable AI-driven healthcare solutions for COPD diagnosis.Critical relevance statement: Self-supervised learning coupled with balanced datasets significantly improves COPD detection model performance, addressing biases across diverse ethnic populations and emphasizing the crucial role of equitable AI-driven healthcare solutions.Key points: Self-supervised learning methods outperform supervised learning methods, showing higher AUC values (p < 0.001). Balanced datasets with non-Hispanic White and African American individuals improve model performance. Training on diverse datasets enhances COPD detection accuracy. Ethnically diverse datasets reduce bias in COPD detection models. SimCLR models mitigate biases in COPD detection across ethnicities.","PeriodicalId":13639,"journal":{"name":"Insights into Imaging","volume":"15 1","pages":"198"},"PeriodicalIF":4.1000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11306482/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Insights into Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s13244-024-01781-x","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: To evaluate the performance and potential biases of deep-learning models in detecting chronic obstructive pulmonary disease (COPD) on chest CT scans across different ethnic groups, specifically non-Hispanic White (NHW) and African American (AA) populations.

Materials and methods: Inspiratory chest CT and clinical data from 7549 Genetic epidemiology of COPD individuals (mean age 62 years old, 56-69 interquartile range), including 5240 NHW and 2309 AA individuals, were retrospectively analyzed. Several factors influencing COPD binary classification performance on different ethnic populations were examined: (1) effects of training population: NHW-only, AA-only, balanced set (half NHW, half AA) and the entire set (NHW + AA all); (2) learning strategy: three supervised learning (SL) vs. three self-supervised learning (SSL) methods. Distribution shifts across ethnicity were further assessed for the top-performing methods.

Results: The learning strategy significantly influenced model performance, with SSL methods achieving higher performances compared to SL methods (p < 0.001), across all training configurations. Training on balanced datasets containing NHW and AA individuals resulted in improved model performance compared to population-specific datasets. Distribution shifts were found between ethnicities for the same health status, particularly when models were trained on nearest-neighbor contrastive SSL. Training on a balanced dataset resulted in fewer distribution shifts across ethnicity and health status, highlighting its efficacy in reducing biases.

Conclusion: Our findings demonstrate that utilizing SSL methods and training on large and balanced datasets can enhance COPD detection model performance and reduce biases across diverse ethnic populations. These findings emphasize the importance of equitable AI-driven healthcare solutions for COPD diagnosis.

Critical relevance statement: Self-supervised learning coupled with balanced datasets significantly improves COPD detection model performance, addressing biases across diverse ethnic populations and emphasizing the crucial role of equitable AI-driven healthcare solutions.

Key points: Self-supervised learning methods outperform supervised learning methods, showing higher AUC values (p < 0.001). Balanced datasets with non-Hispanic White and African American individuals improve model performance. Training on diverse datasets enhances COPD detection accuracy. Ethnically diverse datasets reduce bias in COPD detection models. SimCLR models mitigate biases in COPD detection across ethnicities.

查看原文本刊更多论文

深度学习模型如何跨人群泛化？慢性阻塞性肺病检测的跨种族泛化。

目的评估深度学习模型在不同种族群体，特别是非西班牙裔白人（NHW）和非裔美国人（AA）人群胸部 CT 扫描中检测慢性阻塞性肺病（COPD）的性能和潜在偏差：回顾性分析了 7549 名慢性阻塞性肺病遗传流行病学患者（平均年龄 62 岁，四分位数间距 56-69 岁）的胸部 CT 吸气和临床数据，其中包括 5240 名非西班牙裔白人和 2309 名非裔美国人。研究考察了影响不同种族人群 COPD 二元分类性能的几个因素：（1）训练人群的影响：(2) 学习策略：三种监督学习（SL）方法与三种自我监督学习（SSL）方法。进一步评估了表现最好的方法在不同种族间的分布变化：结果：学习策略对模型性能有很大影响，与 SL 方法相比，SSL 方法的性能更高（p 结论：我们的研究结果表明，利用 SSL 和 SL 方法可以提高模型性能：我们的研究结果表明，利用 SSL 方法并在大型均衡数据集上进行训练，可以提高慢性阻塞性肺病检测模型的性能，减少不同种族人群的偏差。这些发现强调了公平的人工智能驱动的慢性阻塞性肺病诊断医疗解决方案的重要性：自监督学习与均衡数据集相结合，可显著提高慢性阻塞性肺病检测模型的性能，解决不同种族人群的偏差问题，并强调了公平的人工智能驱动型医疗解决方案的重要作用：自监督学习方法优于监督学习方法，显示出更高的 AUC 值（p

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Insights into Imaging Medicine-Radiology, Nuclear Medicine and Imaging

CiteScore

7.30

自引率

4.30%

发文量

182

审稿时长

13 weeks

期刊介绍： Insights into Imaging (I³) is a peer-reviewed open access journal published under the brand SpringerOpen. All content published in the journal is freely available online to anyone, anywhere! I³ continuously updates scientific knowledge and progress in best-practice standards in radiology through the publication of original articles and state-of-the-art reviews and opinions, along with recommendations and statements from the leading radiological societies in Europe. Founded by the European Society of Radiology (ESR), I³ creates a platform for educational material, guidelines and recommendations, and a forum for topics of controversy. A balanced combination of review articles, original papers, short communications from European radiological congresses and information on society matters makes I³ an indispensable source for current information in this field. I³ is owned by the ESR, however authors retain copyright to their article according to the Creative Commons Attribution License (see Copyright and License Agreement). All articles can be read, redistributed and reused for free, as long as the author of the original work is cited properly. The open access fees (article-processing charges) for this journal are kindly sponsored by ESR for all Members. The journal went open access in 2012, which means that all articles published since then are freely available online.