Context-Aware Transformer GAN for Direct Generation of Attenuation and Scatter Corrected PET Data

IF 4.6 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Mojtaba Jafaritadi;Emily Anaya;Garry Chinn;Jarrett Rosenberg;Tie Liang;Craig S. Levin
{"title":"Context-Aware Transformer GAN for Direct Generation of Attenuation and Scatter Corrected PET Data","authors":"Mojtaba Jafaritadi;Emily Anaya;Garry Chinn;Jarrett Rosenberg;Tie Liang;Craig S. Levin","doi":"10.1109/TRPMS.2024.3397318","DOIUrl":null,"url":null,"abstract":"We present a context-aware generative deep learning framework to produce photon attenuation and scatter corrected (ASC) positron emission tomography (PET) images directly from nonattenuation and nonscatter corrected (NASC) images. We trained conditional generative adversarial networks (cGANs) on either single-modality (NASC) or multimodality (NASC+MRI) input data to map NASC images to pixel-wise continuously valued ASC PET images. We designed and evaluated four cGAN models including Pix2Pix, attention-guided cGAN (AG-Pix2Pix), vision transformer cGAN (ViT-GAN), and shifted window transformer cGAN (Swin-GAN). Retrospective 18F-fluorodeoxyglucose (18F-FDG) full-body PET images from 33 subjects were collected and analyzed. Notably, as a particular strength of this work, each patient in the study underwent both a PET/CT scan and a multisequence PET/MRI scan on the same day giving us a gold standard from the former as we investigate ASC for the latter. Quantitative analysis, evaluating image quality using peak signal-to-noise ratio (PSNR), multiscale structural similarity index (MS-SSIM), normalized mean-squared error (NRMSE), and mean absolute error (MAE) metrics, showed no significant impact of input type on PSNR (\n<inline-formula> <tex-math>$p=0.95$ </tex-math></inline-formula>\n), MS-SSIM (\n<inline-formula> <tex-math>$p=0.083$ </tex-math></inline-formula>\n), NRMSE (\n<inline-formula> <tex-math>$p=0.72$ </tex-math></inline-formula>\n), or MAE (\n<inline-formula> <tex-math>$p=0.70$ </tex-math></inline-formula>\n). For multimodal input data, Swin-GAN outperformed Pix2Pix (\n<inline-formula> <tex-math>$p=0.023$ </tex-math></inline-formula>\n) and AG-Pix2Pix (\n<inline-formula> <tex-math>$p \\lt 0.001$ </tex-math></inline-formula>\n), but not ViT-GAN (\n<inline-formula> <tex-math>$p=0.154$ </tex-math></inline-formula>\n) in PSNR. Swin-GAN achieved significantly higher MS-SSIM than ViT-GAN (\n<inline-formula> <tex-math>$p=0.007$ </tex-math></inline-formula>\n) and AG-Pix2Pix (\n<inline-formula> <tex-math>$p=0.002$ </tex-math></inline-formula>\n). Multimodal Swin-GAN demonstrated reduced NRMSE and MAE compared to ViT-GAN (\n<inline-formula> <tex-math>$p=0.023$ </tex-math></inline-formula>\n and 0.031, respectively) and AG-Pix2Pix (both \n<inline-formula> <tex-math>$p \\lt 0.001$ </tex-math></inline-formula>\n), with marginal improvement over Pix2Pix (\n<inline-formula> <tex-math>$p \\lt 0.064$ </tex-math></inline-formula>\n). The cGAN models, in particular Swin-GAN, consistently generated reliable and accurate ASC PET images, whether using multimodal or single-modal input data. The findings indicate that this methodology can be used to generate ASC data from standalone PET scanners or integrated PET/MRI systems, without relying on transmission scan-based attenuation maps.","PeriodicalId":46807,"journal":{"name":"IEEE Transactions on Radiation and Plasma Medical Sciences","volume":"8 6","pages":"677-689"},"PeriodicalIF":4.6000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10521624","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Radiation and Plasma Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10521624/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

We present a context-aware generative deep learning framework to produce photon attenuation and scatter corrected (ASC) positron emission tomography (PET) images directly from nonattenuation and nonscatter corrected (NASC) images. We trained conditional generative adversarial networks (cGANs) on either single-modality (NASC) or multimodality (NASC+MRI) input data to map NASC images to pixel-wise continuously valued ASC PET images. We designed and evaluated four cGAN models including Pix2Pix, attention-guided cGAN (AG-Pix2Pix), vision transformer cGAN (ViT-GAN), and shifted window transformer cGAN (Swin-GAN). Retrospective 18F-fluorodeoxyglucose (18F-FDG) full-body PET images from 33 subjects were collected and analyzed. Notably, as a particular strength of this work, each patient in the study underwent both a PET/CT scan and a multisequence PET/MRI scan on the same day giving us a gold standard from the former as we investigate ASC for the latter. Quantitative analysis, evaluating image quality using peak signal-to-noise ratio (PSNR), multiscale structural similarity index (MS-SSIM), normalized mean-squared error (NRMSE), and mean absolute error (MAE) metrics, showed no significant impact of input type on PSNR ( $p=0.95$ ), MS-SSIM ( $p=0.083$ ), NRMSE ( $p=0.72$ ), or MAE ( $p=0.70$ ). For multimodal input data, Swin-GAN outperformed Pix2Pix ( $p=0.023$ ) and AG-Pix2Pix ( $p \lt 0.001$ ), but not ViT-GAN ( $p=0.154$ ) in PSNR. Swin-GAN achieved significantly higher MS-SSIM than ViT-GAN ( $p=0.007$ ) and AG-Pix2Pix ( $p=0.002$ ). Multimodal Swin-GAN demonstrated reduced NRMSE and MAE compared to ViT-GAN ( $p=0.023$ and 0.031, respectively) and AG-Pix2Pix (both $p \lt 0.001$ ), with marginal improvement over Pix2Pix ( $p \lt 0.064$ ). The cGAN models, in particular Swin-GAN, consistently generated reliable and accurate ASC PET images, whether using multimodal or single-modal input data. The findings indicate that this methodology can be used to generate ASC data from standalone PET scanners or integrated PET/MRI systems, without relying on transmission scan-based attenuation maps.
用于直接生成衰减和散射校正 PET 数据的情境感知变换器 GAN
我们提出了一种上下文感知生成式深度学习框架,可直接从非衰减和非散射校正(NASC)图像生成光子衰减和散射校正(ASC)正电子发射断层扫描(PET)图像。我们在单模态(NASC)或多模态(NASC+MRI)输入数据上训练条件生成对抗网络(cGANs),将 NASC 图像映射到像素连续估值的 ASC PET 图像。我们设计并评估了四种 cGAN 模型,包括 Pix2Pix、注意力引导 cGAN(AG-Pix2Pix)、视觉转换器 cGAN(ViT-GAN)和移位窗口转换器 cGAN(Swin-GAN)。收集并分析了 33 名受试者的回顾性 18F- 氟脱氧葡萄糖(18F-FDG)全身 PET 图像。值得注意的是,作为这项工作的一个特别优势,研究中的每位患者都在同一天接受了 PET/CT 扫描和多序列 PET/MRI 扫描,这为我们提供了前者的金标准,同时我们也对后者的 ASC 进行了研究。使用峰值信噪比(PSNR)、多尺度结构相似性指数(MS-SSIM)、归一化均方误差(NRMSE)和平均绝对误差(MAE)指标评估图像质量的定量分析显示,输入类型对PSNR(p=0.95$)、MS-SSIM(p=0.083$)、NRMSE(p=0.72$)或MAE(p=0.70$)没有显著影响。对于多模态输入数据,Swin-GAN 的 PSNR 优于 Pix2Pix ( $p=0.023$ ) 和 AG-Pix2Pix ( $p \lt 0.001$ ) ,但不如 ViT-GAN ( $p=0.154$ ) 。Swin-GAN的MS-SSIM明显高于ViT-GAN(p=0.007$)和AG-Pix2Pix(p=0.002$)。与 ViT-GAN (p=0.023$)和 AG-Pix2Pix(p 均为 0.001$)相比,多模态 Swin-GAN 的 NRMSE 和 MAE 均有所降低,与 Pix2Pix(p 为 0.064$)相比也略有改善。无论是使用多模态还是单模态输入数据,cGAN 模型,特别是 Swin-GAN 都能持续生成可靠、准确的 ASC PET 图像。研究结果表明,这种方法可用于生成独立 PET 扫描仪或集成 PET/MRI 系统的 ASC 数据,而无需依赖基于透射扫描的衰减图。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Radiation and Plasma Medical Sciences
IEEE Transactions on Radiation and Plasma Medical Sciences RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
CiteScore
8.00
自引率
18.20%
发文量
109
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信