{"title":"Quantitative assessment of impact of technical and population-based factors on fairness of AI models for chest X-ray scans.","authors":"Dmitry Cherezov, Pingfu Fu, Anant Madabhushi","doi":"10.1016/j.compbiomed.2025.111147","DOIUrl":null,"url":null,"abstract":"<p><p>Ensuring fairness in diagnostic AI models is essential for their safe deployment in clinical practice. This study investigates fairness by jointly analyzing population-based factors (sex and race) and technical factors (imaging site and X-ray energy) using chest X-ray data. A total of 49 datasets covering over 321,000 patients and 960,000 images were used. Six experiments were conducted to evaluate the effect of these factors on model performance across classification scores, class activation maps (CAMs), and deep features (DFs). Fairness was assessed using effect sizes derived from Kolmogorov-Smirnov statistics. Within single datasets, performance differences between demographic groups were generally small, with effect sizes below 0.1 for classification scores and CAMs, and up to 0.2 for deep features by sex. However, much larger discrepancies were observed when comparing the same patient group across different imaging sites, with effect sizes ranging from 0.1 to 0.6 across all metrics. Our findings suggest that technical variability has a greater impact on model behavior than population-based factors. Notably, deep features revealed more substantial group differences than surface-level outputs like diagnostic probability scores or CAMs. The findings emphasize the need to evaluate fairness not only within datasets but also across institutions, comparing model performance on training versus external populations, thereby helping to identify fairness limitations that might not be visible through single-cohort analyses.</p>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"198 Pt A","pages":"111147"},"PeriodicalIF":6.3000,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.compbiomed.2025.111147","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Ensuring fairness in diagnostic AI models is essential for their safe deployment in clinical practice. This study investigates fairness by jointly analyzing population-based factors (sex and race) and technical factors (imaging site and X-ray energy) using chest X-ray data. A total of 49 datasets covering over 321,000 patients and 960,000 images were used. Six experiments were conducted to evaluate the effect of these factors on model performance across classification scores, class activation maps (CAMs), and deep features (DFs). Fairness was assessed using effect sizes derived from Kolmogorov-Smirnov statistics. Within single datasets, performance differences between demographic groups were generally small, with effect sizes below 0.1 for classification scores and CAMs, and up to 0.2 for deep features by sex. However, much larger discrepancies were observed when comparing the same patient group across different imaging sites, with effect sizes ranging from 0.1 to 0.6 across all metrics. Our findings suggest that technical variability has a greater impact on model behavior than population-based factors. Notably, deep features revealed more substantial group differences than surface-level outputs like diagnostic probability scores or CAMs. The findings emphasize the need to evaluate fairness not only within datasets but also across institutions, comparing model performance on training versus external populations, thereby helping to identify fairness limitations that might not be visible through single-cohort analyses.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.