通过变分自动编码器潜在空间映射的多模态医学图像到图像的转换。

IF 3.2 2区医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Medical physics Pub Date : 2025-05-29 DOI:10.1002/mp.17912

Zhiwen Liang, Mengjie Cheng, Jinhui Ma, Ying Hu, Song Li, Xin Tian

{"title":"通过变分自动编码器潜在空间映射的多模态医学图像到图像的转换。","authors":"Zhiwen Liang, Mengjie Cheng, Jinhui Ma, Ying Hu, Song Li, Xin Tian","doi":"10.1002/mp.17912","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>Medical image translation has become an essential tool in modern radiotherapy, providing complementary information for target delineation and dose calculation. However, current approaches are constrained by their modality-specific nature, requiring separate model training for each pair of imaging modalities. This limitation hinders the efficient deployment of comprehensive multimodal solutions in clinical practice.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>To develop a unified image translation method using variational autoencoder (VAE) latent space mapping, which enables flexible conversion between different medical imaging modalities to meet clinical demands.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We propose a three-stage approach to construct a unified image translation model. Initially, a VAE is trained to learn a shared latent space for various medical images. A stacked bidirectional transformer is subsequently utilized to learn the mapping between different modalities within the latent space under the guidance of the image modality. Finally, the VAE decoder is fine-tuned to improve image quality. Our internal dataset collected paired imaging data from 87 head and neck cases, with each case containing cone beam computed tomography (CBCT), computed tomography (CT), MR T1c, and MR T2W images. The effectiveness of this strategy is quantitatively evaluated on our internal dataset and a public dataset by the mean absolute error (MAE), peak-signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Additionally, the dosimetry characteristics of the synthetic CT images are evaluated, and subjective quality assessments of the synthetic MR images are conducted to determine their clinical value.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The VAE with the Kullback‒Leibler (KL)-16 image tokenizer demonstrates superior image reconstruction ability, achieving a Fréchet inception distance (FID) of 4.84, a PSNR of 32.80 dB, and an SSIM of 92.33%. In synthetic CT tasks, the model shows greater accuracy in intramodality translations than in cross-modality translations, as evidenced by an MAE of 21.60 ± 8.80 Hounsfield unit (HU) in the CBCT-to-CT task and 45.23 ± 13.21 HU/47.55 ± 13.88 in the MR T1c/T2w-to-CT tasks. For the cross-contrast MR translation tasks, the results are very close, with mean PSNR and SSIM values of 26.33 ± 1.36 dB and 85.21% ± 2.21%, respectively, for the T1c-to-T2w translation and 26.03 ± 1.67 dB and 85.73% ± 2.66%, respectively, for the T2w-to-T1c translation. Dosimetric results indicate that all the gamma pass rates for synthetic CTs are higher than 99% for photon intensity-modulated radiation therapy (IMRT) planning. However, the subjective quality assessment scores for synthetic MR images are lower than those for real MR images.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>The proposed three-stage approach successfully develops a unified image translation model that can effectively handle a wide range of medical image translation tasks. This flexibility and effectiveness make it a valuable tool for clinical applications.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 7","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal medical image-to-image translation via variational autoencoder latent space mapping\",\"authors\":\"Zhiwen Liang, Mengjie Cheng, Jinhui Ma, Ying Hu, Song Li, Xin Tian\",\"doi\":\"10.1002/mp.17912\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>Medical image translation has become an essential tool in modern radiotherapy, providing complementary information for target delineation and dose calculation. However, current approaches are constrained by their modality-specific nature, requiring separate model training for each pair of imaging modalities. This limitation hinders the efficient deployment of comprehensive multimodal solutions in clinical practice.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Purpose</h3>\\n \\n <p>To develop a unified image translation method using variational autoencoder (VAE) latent space mapping, which enables flexible conversion between different medical imaging modalities to meet clinical demands.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>We propose a three-stage approach to construct a unified image translation model. Initially, a VAE is trained to learn a shared latent space for various medical images. A stacked bidirectional transformer is subsequently utilized to learn the mapping between different modalities within the latent space under the guidance of the image modality. Finally, the VAE decoder is fine-tuned to improve image quality. Our internal dataset collected paired imaging data from 87 head and neck cases, with each case containing cone beam computed tomography (CBCT), computed tomography (CT), MR T1c, and MR T2W images. The effectiveness of this strategy is quantitatively evaluated on our internal dataset and a public dataset by the mean absolute error (MAE), peak-signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Additionally, the dosimetry characteristics of the synthetic CT images are evaluated, and subjective quality assessments of the synthetic MR images are conducted to determine their clinical value.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>The VAE with the Kullback‒Leibler (KL)-16 image tokenizer demonstrates superior image reconstruction ability, achieving a Fréchet inception distance (FID) of 4.84, a PSNR of 32.80 dB, and an SSIM of 92.33%. In synthetic CT tasks, the model shows greater accuracy in intramodality translations than in cross-modality translations, as evidenced by an MAE of 21.60 ± 8.80 Hounsfield unit (HU) in the CBCT-to-CT task and 45.23 ± 13.21 HU/47.55 ± 13.88 in the MR T1c/T2w-to-CT tasks. For the cross-contrast MR translation tasks, the results are very close, with mean PSNR and SSIM values of 26.33 ± 1.36 dB and 85.21% ± 2.21%, respectively, for the T1c-to-T2w translation and 26.03 ± 1.67 dB and 85.73% ± 2.66%, respectively, for the T2w-to-T1c translation. Dosimetric results indicate that all the gamma pass rates for synthetic CTs are higher than 99% for photon intensity-modulated radiation therapy (IMRT) planning. However, the subjective quality assessment scores for synthetic MR images are lower than those for real MR images.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>The proposed three-stage approach successfully develops a unified image translation model that can effectively handle a wide range of medical image translation tasks. This flexibility and effectiveness make it a valuable tool for clinical applications.</p>\\n </section>\\n </div>\",\"PeriodicalId\":18384,\"journal\":{\"name\":\"Medical physics\",\"volume\":\"52 7\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical physics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/mp.17912\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mp.17912","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

摘要

背景：医学图像翻译已成为现代放射治疗的重要工具，为靶区描绘和剂量计算提供了补充信息。然而，目前的方法受到模式特异性的限制，需要对每对成像模式进行单独的模型训练。这一限制阻碍了临床实践中综合多模式解决方案的有效部署。目的：建立一种基于变分自编码器（VAE）潜空间映射的统一图像转换方法，实现不同医学成像模式之间的灵活转换，以满足临床需求。方法：我们提出了一个三阶段的方法来构建统一的图像翻译模型。首先，训练VAE学习各种医学图像的共享潜在空间。然后利用堆叠的双向变压器在图像模态的引导下学习潜在空间内不同模态之间的映射。最后，对VAE解码器进行微调以提高图像质量。我们的内部数据集收集了87例头颈部病例的成对成像数据，每个病例都包含锥束计算机断层扫描（CBCT）、计算机断层扫描（CT）、MR T1c和MR T2W图像。通过平均绝对误差（MAE）、峰值信噪比（PSNR）和结构相似性指数（SSIM），在我们的内部数据集和公共数据集上定量评估了该策略的有效性。此外，对合成CT图像的剂量学特征进行评估，并对合成MR图像进行主观质量评估，以确定其临床价值。结果：采用Kullback-Leibler (KL)-16图像标记器的VAE具有较好的图像重建能力，获得了4.84的起始距离（FID）、32.80 dB的PSNR和92.33%的SSIM。在合成CT任务中，该模型在模态内翻译方面的准确率高于跨模态翻译，CBCT-to-CT任务的MAE为21.60±8.80 Hounsfield单位（HU）， MR T1c/T2w-to-CT任务的MAE为45.23±13.21 HU/47.55±13.88。对于交叉对比MR翻译任务，结果非常接近，t1 - t2w翻译的平均PSNR和SSIM值分别为26.33±1.36 dB和85.21%±2.21%，t2w - t1c翻译的平均PSNR和SSIM值分别为26.03±1.67 dB和85.73%±2.66%。剂量学结果表明，在光子强度调制放射治疗（IMRT）计划中，所有合成ct的伽马通过率均高于99%。然而，合成磁共振图像的主观质量评价分数低于真实磁共振图像。结论：所提出的三阶段方法成功地建立了一个统一的图像翻译模型，可以有效地处理广泛的医学图像翻译任务。这种灵活性和有效性使其成为临床应用的宝贵工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multimodal medical image-to-image translation via variational autoencoder latent space mapping

Background

Medical image translation has become an essential tool in modern radiotherapy, providing complementary information for target delineation and dose calculation. However, current approaches are constrained by their modality-specific nature, requiring separate model training for each pair of imaging modalities. This limitation hinders the efficient deployment of comprehensive multimodal solutions in clinical practice.

Purpose

To develop a unified image translation method using variational autoencoder (VAE) latent space mapping, which enables flexible conversion between different medical imaging modalities to meet clinical demands.

Methods

We propose a three-stage approach to construct a unified image translation model. Initially, a VAE is trained to learn a shared latent space for various medical images. A stacked bidirectional transformer is subsequently utilized to learn the mapping between different modalities within the latent space under the guidance of the image modality. Finally, the VAE decoder is fine-tuned to improve image quality. Our internal dataset collected paired imaging data from 87 head and neck cases, with each case containing cone beam computed tomography (CBCT), computed tomography (CT), MR T1c, and MR T2W images. The effectiveness of this strategy is quantitatively evaluated on our internal dataset and a public dataset by the mean absolute error (MAE), peak-signal-to-noise ratio (PSNR), and structural similarity index (SSIM). Additionally, the dosimetry characteristics of the synthetic CT images are evaluated, and subjective quality assessments of the synthetic MR images are conducted to determine their clinical value.

Results

The VAE with the Kullback‒Leibler (KL)-16 image tokenizer demonstrates superior image reconstruction ability, achieving a Fréchet inception distance (FID) of 4.84, a PSNR of 32.80 dB, and an SSIM of 92.33%. In synthetic CT tasks, the model shows greater accuracy in intramodality translations than in cross-modality translations, as evidenced by an MAE of 21.60 ± 8.80 Hounsfield unit (HU) in the CBCT-to-CT task and 45.23 ± 13.21 HU/47.55 ± 13.88 in the MR T1c/T2w-to-CT tasks. For the cross-contrast MR translation tasks, the results are very close, with mean PSNR and SSIM values of 26.33 ± 1.36 dB and 85.21% ± 2.21%, respectively, for the T1c-to-T2w translation and 26.03 ± 1.67 dB and 85.73% ± 2.66%, respectively, for the T2w-to-T1c translation. Dosimetric results indicate that all the gamma pass rates for synthetic CTs are higher than 99% for photon intensity-modulated radiation therapy (IMRT) planning. However, the subjective quality assessment scores for synthetic MR images are lower than those for real MR images.

Conclusions

The proposed three-stage approach successfully develops a unified image translation model that can effectively handle a wide range of medical image translation tasks. This flexibility and effectiveness make it a valuable tool for clinical applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medical physics 医学-核医学

CiteScore

6.80

自引率

15.80%

发文量

660

审稿时长

1.7 months

期刊介绍： Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.