Mojtaba Jafaritadi;Emily Anaya;Garry Chinn;Jarrett Rosenberg;Tie Liang;Craig S. Levin
{"title":"Context-Aware Transformer GAN for Direct Generation of Attenuation and Scatter Corrected PET Data","authors":"Mojtaba Jafaritadi;Emily Anaya;Garry Chinn;Jarrett Rosenberg;Tie Liang;Craig S. Levin","doi":"10.1109/TRPMS.2024.3397318","DOIUrl":null,"url":null,"abstract":"We present a context-aware generative deep learning framework to produce photon attenuation and scatter corrected (ASC) positron emission tomography (PET) images directly from nonattenuation and nonscatter corrected (NASC) images. We trained conditional generative adversarial networks (cGANs) on either single-modality (NASC) or multimodality (NASC+MRI) input data to map NASC images to pixel-wise continuously valued ASC PET images. We designed and evaluated four cGAN models including Pix2Pix, attention-guided cGAN (AG-Pix2Pix), vision transformer cGAN (ViT-GAN), and shifted window transformer cGAN (Swin-GAN). Retrospective 18F-fluorodeoxyglucose (18F-FDG) full-body PET images from 33 subjects were collected and analyzed. Notably, as a particular strength of this work, each patient in the study underwent both a PET/CT scan and a multisequence PET/MRI scan on the same day giving us a gold standard from the former as we investigate ASC for the latter. Quantitative analysis, evaluating image quality using peak signal-to-noise ratio (PSNR), multiscale structural similarity index (MS-SSIM), normalized mean-squared error (NRMSE), and mean absolute error (MAE) metrics, showed no significant impact of input type on PSNR (\n<inline-formula> <tex-math>$p=0.95$ </tex-math></inline-formula>\n), MS-SSIM (\n<inline-formula> <tex-math>$p=0.083$ </tex-math></inline-formula>\n), NRMSE (\n<inline-formula> <tex-math>$p=0.72$ </tex-math></inline-formula>\n), or MAE (\n<inline-formula> <tex-math>$p=0.70$ </tex-math></inline-formula>\n). For multimodal input data, Swin-GAN outperformed Pix2Pix (\n<inline-formula> <tex-math>$p=0.023$ </tex-math></inline-formula>\n) and AG-Pix2Pix (\n<inline-formula> <tex-math>$p \\lt 0.001$ </tex-math></inline-formula>\n), but not ViT-GAN (\n<inline-formula> <tex-math>$p=0.154$ </tex-math></inline-formula>\n) in PSNR. Swin-GAN achieved significantly higher MS-SSIM than ViT-GAN (\n<inline-formula> <tex-math>$p=0.007$ </tex-math></inline-formula>\n) and AG-Pix2Pix (\n<inline-formula> <tex-math>$p=0.002$ </tex-math></inline-formula>\n). Multimodal Swin-GAN demonstrated reduced NRMSE and MAE compared to ViT-GAN (\n<inline-formula> <tex-math>$p=0.023$ </tex-math></inline-formula>\n and 0.031, respectively) and AG-Pix2Pix (both \n<inline-formula> <tex-math>$p \\lt 0.001$ </tex-math></inline-formula>\n), with marginal improvement over Pix2Pix (\n<inline-formula> <tex-math>$p \\lt 0.064$ </tex-math></inline-formula>\n). The cGAN models, in particular Swin-GAN, consistently generated reliable and accurate ASC PET images, whether using multimodal or single-modal input data. The findings indicate that this methodology can be used to generate ASC data from standalone PET scanners or integrated PET/MRI systems, without relying on transmission scan-based attenuation maps.","PeriodicalId":46807,"journal":{"name":"IEEE Transactions on Radiation and Plasma Medical Sciences","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10521624","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Radiation and Plasma Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10521624/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
We present a context-aware generative deep learning framework to produce photon attenuation and scatter corrected (ASC) positron emission tomography (PET) images directly from nonattenuation and nonscatter corrected (NASC) images. We trained conditional generative adversarial networks (cGANs) on either single-modality (NASC) or multimodality (NASC+MRI) input data to map NASC images to pixel-wise continuously valued ASC PET images. We designed and evaluated four cGAN models including Pix2Pix, attention-guided cGAN (AG-Pix2Pix), vision transformer cGAN (ViT-GAN), and shifted window transformer cGAN (Swin-GAN). Retrospective 18F-fluorodeoxyglucose (18F-FDG) full-body PET images from 33 subjects were collected and analyzed. Notably, as a particular strength of this work, each patient in the study underwent both a PET/CT scan and a multisequence PET/MRI scan on the same day giving us a gold standard from the former as we investigate ASC for the latter. Quantitative analysis, evaluating image quality using peak signal-to-noise ratio (PSNR), multiscale structural similarity index (MS-SSIM), normalized mean-squared error (NRMSE), and mean absolute error (MAE) metrics, showed no significant impact of input type on PSNR (
$p=0.95$
), MS-SSIM (
$p=0.083$
), NRMSE (
$p=0.72$
), or MAE (
$p=0.70$
). For multimodal input data, Swin-GAN outperformed Pix2Pix (
$p=0.023$
) and AG-Pix2Pix (
$p \lt 0.001$
), but not ViT-GAN (
$p=0.154$
) in PSNR. Swin-GAN achieved significantly higher MS-SSIM than ViT-GAN (
$p=0.007$
) and AG-Pix2Pix (
$p=0.002$
). Multimodal Swin-GAN demonstrated reduced NRMSE and MAE compared to ViT-GAN (
$p=0.023$
and 0.031, respectively) and AG-Pix2Pix (both
$p \lt 0.001$
), with marginal improvement over Pix2Pix (
$p \lt 0.064$
). The cGAN models, in particular Swin-GAN, consistently generated reliable and accurate ASC PET images, whether using multimodal or single-modal input data. The findings indicate that this methodology can be used to generate ASC data from standalone PET scanners or integrated PET/MRI systems, without relying on transmission scan-based attenuation maps.