Context-Aware Transformer GAN for Direct Generation of Attenuation and Scatter Corrected PET Data

IF 4.6 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

IEEE Transactions on Radiation and Plasma Medical Sciences Pub Date : 2024-03-06 DOI:10.1109/TRPMS.2024.3397318

Mojtaba Jafaritadi;Emily Anaya;Garry Chinn;Jarrett Rosenberg;Tie Liang;Craig S. Levin

{"title":"Context-Aware Transformer GAN for Direct Generation of Attenuation and Scatter Corrected PET Data","authors":"Mojtaba Jafaritadi;Emily Anaya;Garry Chinn;Jarrett Rosenberg;Tie Liang;Craig S. Levin","doi":"10.1109/TRPMS.2024.3397318","DOIUrl":null,"url":null,"abstract":"We present a context-aware generative deep learning framework to produce photon attenuation and scatter corrected (ASC) positron emission tomography (PET) images directly from nonattenuation and nonscatter corrected (NASC) images. We trained conditional generative adversarial networks (cGANs) on either single-modality (NASC) or multimodality (NASC+MRI) input data to map NASC images to pixel-wise continuously valued ASC PET images. We designed and evaluated four cGAN models including Pix2Pix, attention-guided cGAN (AG-Pix2Pix), vision transformer cGAN (ViT-GAN), and shifted window transformer cGAN (Swin-GAN). Retrospective 18F-fluorodeoxyglucose (18F-FDG) full-body PET images from 33 subjects were collected and analyzed. Notably, as a particular strength of this work, each patient in the study underwent both a PET/CT scan and a multisequence PET/MRI scan on the same day giving us a gold standard from the former as we investigate ASC for the latter. Quantitative analysis, evaluating image quality using peak signal-to-noise ratio (PSNR), multiscale structural similarity index (MS-SSIM), normalized mean-squared error (NRMSE), and mean absolute error (MAE) metrics, showed no significant impact of input type on PSNR (\n<inline-formula> <tex-math>$p=0.95$ </tex-math></inline-formula>\n), MS-SSIM (\n<inline-formula> <tex-math>$p=0.083$ </tex-math></inline-formula>\n), NRMSE (\n<inline-formula> <tex-math>$p=0.72$ </tex-math></inline-formula>\n), or MAE (\n<inline-formula> <tex-math>$p=0.70$ </tex-math></inline-formula>\n). For multimodal input data, Swin-GAN outperformed Pix2Pix (\n<inline-formula> <tex-math>$p=0.023$ </tex-math></inline-formula>\n) and AG-Pix2Pix (\n<inline-formula> <tex-math>$p \\lt 0.001$ </tex-math></inline-formula>\n), but not ViT-GAN (\n<inline-formula> <tex-math>$p=0.154$ </tex-math></inline-formula>\n) in PSNR. Swin-GAN achieved significantly higher MS-SSIM than ViT-GAN (\n<inline-formula> <tex-math>$p=0.007$ </tex-math></inline-formula>\n) and AG-Pix2Pix (\n<inline-formula> <tex-math>$p=0.002$ </tex-math></inline-formula>\n). Multimodal Swin-GAN demonstrated reduced NRMSE and MAE compared to ViT-GAN (\n<inline-formula> <tex-math>$p=0.023$ </tex-math></inline-formula>\n and 0.031, respectively) and AG-Pix2Pix (both \n<inline-formula> <tex-math>$p \\lt 0.001$ </tex-math></inline-formula>\n), with marginal improvement over Pix2Pix (\n<inline-formula> <tex-math>$p \\lt 0.064$ </tex-math></inline-formula>\n). The cGAN models, in particular Swin-GAN, consistently generated reliable and accurate ASC PET images, whether using multimodal or single-modal input data. The findings indicate that this methodology can be used to generate ASC data from standalone PET scanners or integrated PET/MRI systems, without relying on transmission scan-based attenuation maps.","PeriodicalId":46807,"journal":{"name":"IEEE Transactions on Radiation and Plasma Medical Sciences","volume":"8 6","pages":"677-689"},"PeriodicalIF":4.6000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10521624","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Radiation and Plasma Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10521624/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

We present a context-aware generative deep learning framework to produce photon attenuation and scatter corrected (ASC) positron emission tomography (PET) images directly from nonattenuation and nonscatter corrected (NASC) images. We trained conditional generative adversarial networks (cGANs) on either single-modality (NASC) or multimodality (NASC+MRI) input data to map NASC images to pixel-wise continuously valued ASC PET images. We designed and evaluated four cGAN models including Pix2Pix, attention-guided cGAN (AG-Pix2Pix), vision transformer cGAN (ViT-GAN), and shifted window transformer cGAN (Swin-GAN). Retrospective 18F-fluorodeoxyglucose (18F-FDG) full-body PET images from 33 subjects were collected and analyzed. Notably, as a particular strength of this work, each patient in the study underwent both a PET/CT scan and a multisequence PET/MRI scan on the same day giving us a gold standard from the former as we investigate ASC for the latter. Quantitative analysis, evaluating image quality using peak signal-to-noise ratio (PSNR), multiscale structural similarity index (MS-SSIM), normalized mean-squared error (NRMSE), and mean absolute error (MAE) metrics, showed no significant impact of input type on PSNR (

$p=0.95$

), MS-SSIM (

$p=0.083$

), NRMSE (

$p=0.72$

), or MAE (

$p=0.70$

). For multimodal input data, Swin-GAN outperformed Pix2Pix (

$p=0.023$

) and AG-Pix2Pix (

$p \lt 0.001$

), but not ViT-GAN (

$p=0.154$

) in PSNR. Swin-GAN achieved significantly higher MS-SSIM than ViT-GAN (

$p=0.007$

) and AG-Pix2Pix (

$p=0.002$

). Multimodal Swin-GAN demonstrated reduced NRMSE and MAE compared to ViT-GAN (

$p=0.023$

and 0.031, respectively) and AG-Pix2Pix (both

$p \lt 0.001$

), with marginal improvement over Pix2Pix (

$p \lt 0.064$

). The cGAN models, in particular Swin-GAN, consistently generated reliable and accurate ASC PET images, whether using multimodal or single-modal input data. The findings indicate that this methodology can be used to generate ASC data from standalone PET scanners or integrated PET/MRI systems, without relying on transmission scan-based attenuation maps.

查看原文本刊更多论文

用于直接生成衰减和散射校正 PET 数据的情境感知变换器 GAN

我们提出了一种上下文感知生成式深度学习框架，可直接从非衰减和非散射校正（NASC）图像生成光子衰减和散射校正（ASC）正电子发射断层扫描（PET）图像。我们在单模态（NASC）或多模态（NASC+MRI）输入数据上训练条件生成对抗网络（cGANs），将 NASC 图像映射到像素连续估值的 ASC PET 图像。我们设计并评估了四种 cGAN 模型，包括 Pix2Pix、注意力引导 cGAN（AG-Pix2Pix）、视觉转换器 cGAN（ViT-GAN）和移位窗口转换器 cGAN（Swin-GAN）。收集并分析了 33 名受试者的回顾性 18F- 氟脱氧葡萄糖（18F-FDG）全身 PET 图像。值得注意的是，作为这项工作的一个特别优势，研究中的每位患者都在同一天接受了 PET/CT 扫描和多序列 PET/MRI 扫描，这为我们提供了前者的金标准，同时我们也对后者的 ASC 进行了研究。使用峰值信噪比（PSNR）、多尺度结构相似性指数（MS-SSIM）、归一化均方误差（NRMSE）和平均绝对误差（MAE）指标评估图像质量的定量分析显示，输入类型对PSNR（p=0.95$）、MS-SSIM（p=0.083$）、NRMSE（p=0.72$）或MAE（p=0.70$）没有显著影响。对于多模态输入数据，Swin-GAN 的 PSNR 优于 Pix2Pix ( $p=0.023$ ) 和 AG-Pix2Pix ( $p \lt 0.001$ ) ，但不如 ViT-GAN ( $p=0.154$ ) 。Swin-GAN的MS-SSIM明显高于ViT-GAN（p=0.007$）和AG-Pix2Pix（p=0.002$）。与 ViT-GAN （p=0.023$）和 AG-Pix2Pix（p 均为 0.001$）相比，多模态 Swin-GAN 的 NRMSE 和 MAE 均有所降低，与 Pix2Pix（p 为 0.064$）相比也略有改善。无论是使用多模态还是单模态输入数据，cGAN 模型，特别是 Swin-GAN 都能持续生成可靠、准确的 ASC PET 图像。研究结果表明，这种方法可用于生成独立 PET 扫描仪或集成 PET/MRI 系统的 ASC 数据，而无需依赖基于透射扫描的衰减图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Radiation and Plasma Medical Sciences RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

8.00

自引率

18.20%

发文量

109