Fair Text to Medical Image Diffusion Model with Subgroup Distribution Aligned Tuning.

Xu Han, Fangfang Fan, Jingzhao Rong, Zhen Li, Georges El Fakhri, Qingyu Chen, Xiaofeng Liu
{"title":"Fair Text to Medical Image Diffusion Model with Subgroup Distribution Aligned Tuning.","authors":"Xu Han, Fangfang Fan, Jingzhao Rong, Zhen Li, Georges El Fakhri, Qingyu Chen, Xiaofeng Liu","doi":"10.1117/12.3046450","DOIUrl":null,"url":null,"abstract":"<p><p>The Text to Medical Image (T2MedI) approach using latent diffusion models holds significant promise for addressing the scarcity of medical imaging data and elucidating the appearance distribution of lesions corresponding to specific patient status descriptions. Like natural image synthesis models, our investigations reveal that the T2MedI model may exhibit biases towards certain subgroups, potentially neglecting minority groups present in the training dataset. In this study, we initially developed a T2MedI model adapted from the pre-trained Imagen framework. This model employs a fixed Contrastive Language-Image Pre-training (CLIP) text encoder, with its decoder fine-tuned using medical images from the Radiology Objects in Context (ROCO) dataset. We conduct both qualitative and quantitative analyses to examine its gender bias. To address this issue, we propose a subgroup distribution alignment method during fine-tuning on a target application dataset. Specifically, this process involves an alignment loss, guided by an off-the-shelf sensitivity-subgroup classifier, which aims to synchronize the classification probabilities between the generated images and those expected in the target dataset. Additionally, we preserve image quality through a CLIP-consistency regularization term, based on a knowledge distillation framework. For evaluation purposes, we designated the BraTS18 dataset as the target, and developed a gender classifier based on brain magnetic resonance (MR) imaging slices derived from it. Our methodology significantly mitigates gender representation inconsistencies in the generated MR images, aligning them more closely with the gender distribution in the BraTS18 dataset.</p>","PeriodicalId":74505,"journal":{"name":"Proceedings of SPIE--the International Society for Optical Engineering","volume":"13411 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12360154/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of SPIE--the International Society for Optical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.3046450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/10 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The Text to Medical Image (T2MedI) approach using latent diffusion models holds significant promise for addressing the scarcity of medical imaging data and elucidating the appearance distribution of lesions corresponding to specific patient status descriptions. Like natural image synthesis models, our investigations reveal that the T2MedI model may exhibit biases towards certain subgroups, potentially neglecting minority groups present in the training dataset. In this study, we initially developed a T2MedI model adapted from the pre-trained Imagen framework. This model employs a fixed Contrastive Language-Image Pre-training (CLIP) text encoder, with its decoder fine-tuned using medical images from the Radiology Objects in Context (ROCO) dataset. We conduct both qualitative and quantitative analyses to examine its gender bias. To address this issue, we propose a subgroup distribution alignment method during fine-tuning on a target application dataset. Specifically, this process involves an alignment loss, guided by an off-the-shelf sensitivity-subgroup classifier, which aims to synchronize the classification probabilities between the generated images and those expected in the target dataset. Additionally, we preserve image quality through a CLIP-consistency regularization term, based on a knowledge distillation framework. For evaluation purposes, we designated the BraTS18 dataset as the target, and developed a gender classifier based on brain magnetic resonance (MR) imaging slices derived from it. Our methodology significantly mitigates gender representation inconsistencies in the generated MR images, aligning them more closely with the gender distribution in the BraTS18 dataset.

具有子群分布对齐调整的公平文本到医学图像扩散模型。
使用潜在扩散模型的文本到医学图像(T2MedI)方法在解决医学成像数据的稀缺性和阐明与特定患者状态描述相对应的病变外观分布方面具有重要的前景。与自然图像合成模型一样,我们的研究表明,T2MedI模型可能会对某些子群体产生偏见,可能会忽略训练数据集中的少数群体。在本研究中,我们首先根据预训练Imagen框架开发了T2MedI模型。该模型采用固定的对比语言图像预训练(CLIP)文本编码器,其解码器使用来自放射学对象上下文(ROCO)数据集的医学图像进行微调。我们进行了定性和定量分析,以检验其性别偏见。为了解决这个问题,我们提出了一种在目标应用程序数据集微调期间的子组分布对齐方法。具体来说,这个过程涉及对齐损失,由现成的灵敏度-子组分类器指导,其目的是同步生成的图像和目标数据集中预期的分类概率。此外,我们通过基于知识蒸馏框架的clip一致性正则化项来保持图像质量。为了评估目的,我们指定BraTS18数据集作为目标,并基于其衍生的脑磁共振(MR)成像切片开发了一个性别分类器。我们的方法显著减轻了生成的MR图像中性别代表的不一致性,使其与BraTS18数据集中的性别分布更接近。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.50
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信