Xu Han, Fangfang Fan, Jingzhao Rong, Zhen Li, Georges El Fakhri, Qingyu Chen, Xiaofeng Liu
{"title":"Fair Text to Medical Image Diffusion Model with Subgroup Distribution Aligned Tuning.","authors":"Xu Han, Fangfang Fan, Jingzhao Rong, Zhen Li, Georges El Fakhri, Qingyu Chen, Xiaofeng Liu","doi":"10.1117/12.3046450","DOIUrl":null,"url":null,"abstract":"<p><p>The Text to Medical Image (T2MedI) approach using latent diffusion models holds significant promise for addressing the scarcity of medical imaging data and elucidating the appearance distribution of lesions corresponding to specific patient status descriptions. Like natural image synthesis models, our investigations reveal that the T2MedI model may exhibit biases towards certain subgroups, potentially neglecting minority groups present in the training dataset. In this study, we initially developed a T2MedI model adapted from the pre-trained Imagen framework. This model employs a fixed Contrastive Language-Image Pre-training (CLIP) text encoder, with its decoder fine-tuned using medical images from the Radiology Objects in Context (ROCO) dataset. We conduct both qualitative and quantitative analyses to examine its gender bias. To address this issue, we propose a subgroup distribution alignment method during fine-tuning on a target application dataset. Specifically, this process involves an alignment loss, guided by an off-the-shelf sensitivity-subgroup classifier, which aims to synchronize the classification probabilities between the generated images and those expected in the target dataset. Additionally, we preserve image quality through a CLIP-consistency regularization term, based on a knowledge distillation framework. For evaluation purposes, we designated the BraTS18 dataset as the target, and developed a gender classifier based on brain magnetic resonance (MR) imaging slices derived from it. Our methodology significantly mitigates gender representation inconsistencies in the generated MR images, aligning them more closely with the gender distribution in the BraTS18 dataset.</p>","PeriodicalId":74505,"journal":{"name":"Proceedings of SPIE--the International Society for Optical Engineering","volume":"13411 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360154/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of SPIE--the International Society for Optical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3046450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/10 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The Text to Medical Image (T2MedI) approach using latent diffusion models holds significant promise for addressing the scarcity of medical imaging data and elucidating the appearance distribution of lesions corresponding to specific patient status descriptions. Like natural image synthesis models, our investigations reveal that the T2MedI model may exhibit biases towards certain subgroups, potentially neglecting minority groups present in the training dataset. In this study, we initially developed a T2MedI model adapted from the pre-trained Imagen framework. This model employs a fixed Contrastive Language-Image Pre-training (CLIP) text encoder, with its decoder fine-tuned using medical images from the Radiology Objects in Context (ROCO) dataset. We conduct both qualitative and quantitative analyses to examine its gender bias. To address this issue, we propose a subgroup distribution alignment method during fine-tuning on a target application dataset. Specifically, this process involves an alignment loss, guided by an off-the-shelf sensitivity-subgroup classifier, which aims to synchronize the classification probabilities between the generated images and those expected in the target dataset. Additionally, we preserve image quality through a CLIP-consistency regularization term, based on a knowledge distillation framework. For evaluation purposes, we designated the BraTS18 dataset as the target, and developed a gender classifier based on brain magnetic resonance (MR) imaging slices derived from it. Our methodology significantly mitigates gender representation inconsistencies in the generated MR images, aligning them more closely with the gender distribution in the BraTS18 dataset.