MGAug：图像变形潜在空间的多模态几何增强。

IF 10.7 1区医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Medical image analysis Pub Date : 2025-03-15 DOI:10.1016/j.media.2025.103540

Tonmoy Hossain, Miaomiao Zhang

{"title":"MGAug：图像变形潜在空间的多模态几何增强。","authors":"Tonmoy Hossain, Miaomiao Zhang","doi":"10.1016/j.media.2025.103540","DOIUrl":null,"url":null,"abstract":"<div><div>Geometric transformations have been widely used to augment the size of training images. Existing methods often assume a unimodal distribution of the underlying transformations between images, which limits their power when data with multimodal distributions occur. In this paper, we propose a novel model, <em>Multimodal Geometric Augmentation</em> (MGAug), that for the first time generates augmenting transformations in a multimodal latent space of geometric deformations. To achieve this, we first develop a deep network that embeds the learning of latent geometric spaces of diffeomorphic transformations (a.k.a. diffeomorphisms) in a variational autoencoder (VAE). A mixture of multivariate Gaussians is formulated in the tangent space of diffeomorphisms and serves as a prior to approximate the hidden distribution of image transformations. We then augment the original training dataset by deforming images using randomly sampled transformations from the learned multimodal latent space of VAE. To validate the efficiency of our model, we jointly learn the augmentation strategy with two distinct domain-specific tasks: multi-class classification on both synthetic 2D and real 3D brain MRIs, and segmentation on real 3D brain MRIs dataset. We also compare MGAug with state-of-the-art transformation-based image augmentation algorithms. Experimental results show that our proposed approach outperforms all baselines by significantly improved prediction accuracy. Our code is publicly available at <span><span>GitHub</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":"102 ","pages":"Article 103540"},"PeriodicalIF":10.7000,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations\",\"authors\":\"Tonmoy Hossain, Miaomiao Zhang\",\"doi\":\"10.1016/j.media.2025.103540\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Geometric transformations have been widely used to augment the size of training images. Existing methods often assume a unimodal distribution of the underlying transformations between images, which limits their power when data with multimodal distributions occur. In this paper, we propose a novel model, <em>Multimodal Geometric Augmentation</em> (MGAug), that for the first time generates augmenting transformations in a multimodal latent space of geometric deformations. To achieve this, we first develop a deep network that embeds the learning of latent geometric spaces of diffeomorphic transformations (a.k.a. diffeomorphisms) in a variational autoencoder (VAE). A mixture of multivariate Gaussians is formulated in the tangent space of diffeomorphisms and serves as a prior to approximate the hidden distribution of image transformations. We then augment the original training dataset by deforming images using randomly sampled transformations from the learned multimodal latent space of VAE. To validate the efficiency of our model, we jointly learn the augmentation strategy with two distinct domain-specific tasks: multi-class classification on both synthetic 2D and real 3D brain MRIs, and segmentation on real 3D brain MRIs dataset. We also compare MGAug with state-of-the-art transformation-based image augmentation algorithms. Experimental results show that our proposed approach outperforms all baselines by significantly improved prediction accuracy. Our code is publicly available at <span><span>GitHub</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":18328,\"journal\":{\"name\":\"Medical image analysis\",\"volume\":\"102 \",\"pages\":\"Article 103540\"},\"PeriodicalIF\":10.7000,\"publicationDate\":\"2025-03-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical image analysis\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1361841525000878\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1361841525000878","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

几何变换已被广泛用于增强训练图像的大小。现有的方法通常假设图像之间的底层转换是单峰分布，这限制了它们在数据具有多峰分布时的能力。在本文中，我们提出了一个新的模型，即多模态几何增强（MGAug），它首次在几何变形的多模态潜在空间中产生增广变换。为了实现这一点，我们首先开发了一个深度网络，该网络在变分自编码器（VAE）中嵌入了对微分同态变换（又称微分同态）的潜在几何空间的学习。在微分同态的切线空间中建立了多元高斯分布的混合形式，并作为近似图像变换隐藏分布的先验条件。然后，我们通过使用从学习到的VAE的多模态潜在空间中随机采样的变换来变形图像，从而增强原始训练数据集。为了验证模型的有效性，我们共同学习了具有两个不同领域特定任务的增强策略：在合成2D和真实3D脑mri上进行多类分类，以及在真实3D脑mri数据集上进行分割。我们还比较了MGAug与最先进的基于变换的图像增强算法。实验结果表明，该方法显著提高了预测精度，优于所有基线。我们的代码在GitHub上是公开的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations

Geometric transformations have been widely used to augment the size of training images. Existing methods often assume a unimodal distribution of the underlying transformations between images, which limits their power when data with multimodal distributions occur. In this paper, we propose a novel model, Multimodal Geometric Augmentation (MGAug), that for the first time generates augmenting transformations in a multimodal latent space of geometric deformations. To achieve this, we first develop a deep network that embeds the learning of latent geometric spaces of diffeomorphic transformations (a.k.a. diffeomorphisms) in a variational autoencoder (VAE). A mixture of multivariate Gaussians is formulated in the tangent space of diffeomorphisms and serves as a prior to approximate the hidden distribution of image transformations. We then augment the original training dataset by deforming images using randomly sampled transformations from the learned multimodal latent space of VAE. To validate the efficiency of our model, we jointly learn the augmentation strategy with two distinct domain-specific tasks: multi-class classification on both synthetic 2D and real 3D brain MRIs, and segmentation on real 3D brain MRIs dataset. We also compare MGAug with state-of-the-art transformation-based image augmentation algorithms. Experimental results show that our proposed approach outperforms all baselines by significantly improved prediction accuracy. Our code is publicly available at GitHub.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medical image analysis 工程技术-工程：生物医学

CiteScore

22.10

自引率

6.40%

发文量

309

审稿时长

6.6 months

期刊介绍： Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.