Fair Text to Medical Image Diffusion Model with Subgroup Distribution Aligned Tuning.

Proceedings of SPIE--the International Society for Optical Engineering Pub Date : 2025-02-01 Epub Date: 2025-04-10 DOI:10.1117/12.3046450

Xu Han, Fangfang Fan, Jingzhao Rong, Zhen Li, Georges El Fakhri, Qingyu Chen, Xiaofeng Liu

{"title":"Fair Text to Medical Image Diffusion Model with Subgroup Distribution Aligned Tuning.","authors":"Xu Han, Fangfang Fan, Jingzhao Rong, Zhen Li, Georges El Fakhri, Qingyu Chen, Xiaofeng Liu","doi":"10.1117/12.3046450","DOIUrl":null,"url":null,"abstract":"<p><p>The Text to Medical Image (T2MedI) approach using latent diffusion models holds significant promise for addressing the scarcity of medical imaging data and elucidating the appearance distribution of lesions corresponding to specific patient status descriptions. Like natural image synthesis models, our investigations reveal that the T2MedI model may exhibit biases towards certain subgroups, potentially neglecting minority groups present in the training dataset. In this study, we initially developed a T2MedI model adapted from the pre-trained Imagen framework. This model employs a fixed Contrastive Language-Image Pre-training (CLIP) text encoder, with its decoder fine-tuned using medical images from the Radiology Objects in Context (ROCO) dataset. We conduct both qualitative and quantitative analyses to examine its gender bias. To address this issue, we propose a subgroup distribution alignment method during fine-tuning on a target application dataset. Specifically, this process involves an alignment loss, guided by an off-the-shelf sensitivity-subgroup classifier, which aims to synchronize the classification probabilities between the generated images and those expected in the target dataset. Additionally, we preserve image quality through a CLIP-consistency regularization term, based on a knowledge distillation framework. For evaluation purposes, we designated the BraTS18 dataset as the target, and developed a gender classifier based on brain magnetic resonance (MR) imaging slices derived from it. Our methodology significantly mitigates gender representation inconsistencies in the generated MR images, aligning them more closely with the gender distribution in the BraTS18 dataset.</p>","PeriodicalId":74505,"journal":{"name":"Proceedings of SPIE--the International Society for Optical Engineering","volume":"13411 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360154/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of SPIE--the International Society for Optical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3046450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/10 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The Text to Medical Image (T2MedI) approach using latent diffusion models holds significant promise for addressing the scarcity of medical imaging data and elucidating the appearance distribution of lesions corresponding to specific patient status descriptions. Like natural image synthesis models, our investigations reveal that the T2MedI model may exhibit biases towards certain subgroups, potentially neglecting minority groups present in the training dataset. In this study, we initially developed a T2MedI model adapted from the pre-trained Imagen framework. This model employs a fixed Contrastive Language-Image Pre-training (CLIP) text encoder, with its decoder fine-tuned using medical images from the Radiology Objects in Context (ROCO) dataset. We conduct both qualitative and quantitative analyses to examine its gender bias. To address this issue, we propose a subgroup distribution alignment method during fine-tuning on a target application dataset. Specifically, this process involves an alignment loss, guided by an off-the-shelf sensitivity-subgroup classifier, which aims to synchronize the classification probabilities between the generated images and those expected in the target dataset. Additionally, we preserve image quality through a CLIP-consistency regularization term, based on a knowledge distillation framework. For evaluation purposes, we designated the BraTS18 dataset as the target, and developed a gender classifier based on brain magnetic resonance (MR) imaging slices derived from it. Our methodology significantly mitigates gender representation inconsistencies in the generated MR images, aligning them more closely with the gender distribution in the BraTS18 dataset.

查看原文本刊更多论文

具有子群分布对齐调整的公平文本到医学图像扩散模型。

使用潜在扩散模型的文本到医学图像（T2MedI）方法在解决医学成像数据的稀缺性和阐明与特定患者状态描述相对应的病变外观分布方面具有重要的前景。与自然图像合成模型一样，我们的研究表明，T2MedI模型可能会对某些子群体产生偏见，可能会忽略训练数据集中的少数群体。在本研究中，我们首先根据预训练Imagen框架开发了T2MedI模型。该模型采用固定的对比语言图像预训练（CLIP）文本编码器，其解码器使用来自放射学对象上下文（ROCO）数据集的医学图像进行微调。我们进行了定性和定量分析，以检验其性别偏见。为了解决这个问题，我们提出了一种在目标应用程序数据集微调期间的子组分布对齐方法。具体来说，这个过程涉及对齐损失，由现成的灵敏度-子组分类器指导，其目的是同步生成的图像和目标数据集中预期的分类概率。此外，我们通过基于知识蒸馏框架的clip一致性正则化项来保持图像质量。为了评估目的，我们指定BraTS18数据集作为目标，并基于其衍生的脑磁共振（MR）成像切片开发了一个性别分类器。我们的方法显著减轻了生成的MR图像中性别代表的不一致性，使其与BraTS18数据集中的性别分布更接近。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of SPIE--the International Society for Optical Engineering

CiteScore

0.50

自引率

0.00%

发文量