Machine learning evaluation of pneumonia severity: subgroup performance in the Medical Imaging and Data Resource Center modified radiographic assessment of lung edema mastermind challenge.
IF 1.7 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Karen Drukker, Samuel G Armato, Lubomir Hadjiiski, Judy Gichoya, Nicholas Gruszauskas, Jayashree Kalpathy-Cramer, Hui Li, Kyle J Myers, Robert M Tomek, Heather M Whitney, Zi Zhang, Maryellen L Giger
{"title":"Machine learning evaluation of pneumonia severity: subgroup performance in the Medical Imaging and Data Resource Center modified radiographic assessment of lung edema mastermind challenge.","authors":"Karen Drukker, Samuel G Armato, Lubomir Hadjiiski, Judy Gichoya, Nicholas Gruszauskas, Jayashree Kalpathy-Cramer, Hui Li, Kyle J Myers, Robert M Tomek, Heather M Whitney, Zi Zhang, Maryellen L Giger","doi":"10.1117/1.JMI.12.5.054502","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The Medical Imaging and Data Resource Center Mastermind Grand Challenge of modified radiographic assessment of lung edema (mRALE) tasked participants with developing machine learning techniques for automated COVID-19 severity assessment via mRALE scores on portable chest radiographs (CXRs). We examine potential biases across demographic subgroups for the best-performing models of the nine teams participating in the test phase of the challenge.</p><p><strong>Approach: </strong>Models were evaluated against a nonpublic test set of CXRs (814 patients) annotated by radiologists for disease severity (mRALE score 0 to 24). Participants used a variety of data and methods for training. Performance was measured using quadratic-weighted kappa (QWK). Bias analyses considered demographics (sex, age, race, ethnicity, and their intersections) using QWK. In addition, for distinguishing no/mild versus moderate/severe disease, equal opportunity difference (EOD) and average absolute odds difference (AAOD) were calculated. Bias was defined as statistically significant QWK subgroup differences, or EOD outside [ <math><mrow><mo>-</mo> <mn>0.1</mn></mrow> </math> ; 0.1], or AAOD outside [0; 0.1].</p><p><strong>Results: </strong>The nine models demonstrated good agreement with the reference standard (QWK 0.74 to 0.88). The winning model (QWK = 0.884 [0.819; 0.949]) was the only model without biases identified in terms of QWK. The runner-up model (QWK = 0.874 [0.813; 0.936]) showed no identified biases in terms of EOD and AAOD, whereas the winning model disadvantaged three subgroups in each of these metrics. The median number of disadvantaged subgroups for all models was 3.</p><p><strong>Conclusions: </strong>The challenge demonstrated strong model performances but identified subgroup disparities. Bias analysis is essential as models with similar accuracy may exhibit varying fairness.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 5","pages":"054502"},"PeriodicalIF":1.7000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12503059/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.5.054502","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/10/7 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: The Medical Imaging and Data Resource Center Mastermind Grand Challenge of modified radiographic assessment of lung edema (mRALE) tasked participants with developing machine learning techniques for automated COVID-19 severity assessment via mRALE scores on portable chest radiographs (CXRs). We examine potential biases across demographic subgroups for the best-performing models of the nine teams participating in the test phase of the challenge.
Approach: Models were evaluated against a nonpublic test set of CXRs (814 patients) annotated by radiologists for disease severity (mRALE score 0 to 24). Participants used a variety of data and methods for training. Performance was measured using quadratic-weighted kappa (QWK). Bias analyses considered demographics (sex, age, race, ethnicity, and their intersections) using QWK. In addition, for distinguishing no/mild versus moderate/severe disease, equal opportunity difference (EOD) and average absolute odds difference (AAOD) were calculated. Bias was defined as statistically significant QWK subgroup differences, or EOD outside [ ; 0.1], or AAOD outside [0; 0.1].
Results: The nine models demonstrated good agreement with the reference standard (QWK 0.74 to 0.88). The winning model (QWK = 0.884 [0.819; 0.949]) was the only model without biases identified in terms of QWK. The runner-up model (QWK = 0.874 [0.813; 0.936]) showed no identified biases in terms of EOD and AAOD, whereas the winning model disadvantaged three subgroups in each of these metrics. The median number of disadvantaged subgroups for all models was 3.
Conclusions: The challenge demonstrated strong model performances but identified subgroup disparities. Bias analysis is essential as models with similar accuracy may exhibit varying fairness.
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.