{"title":"DuDoCROP:用于减少 CT 金属伪影的双域 CLIP 辅助残留优化感知模型","authors":"Xinrui Zhang, Ailong Cai, Lei Li, Bin Yan","doi":"arxiv-2408.14342","DOIUrl":null,"url":null,"abstract":"Metal artifacts in computed tomography (CT) imaging pose significant\nchallenges to accurate clinical diagnosis. The presence of high-density\nmetallic implants results in artifacts that deteriorate image quality,\nmanifesting in the forms of streaking, blurring, or beam hardening effects,\netc. Nowadays, various deep learning-based approaches, particularly generative\nmodels, have been proposed for metal artifact reduction (MAR). However, these\nmethods have limited perception ability in the diverse morphologies of\ndifferent metal implants with artifacts, which may generate spurious anatomical\nstructures and exhibit inferior generalization capability. To address the\nissues, we leverage visual-language model (VLM) to identify these morphological\nfeatures and introduce them into a dual-domain CLIP-assisted residual\noptimization perception model (DuDoCROP) for MAR. Specifically, a dual-domain\nCLIP (DuDoCLIP) is fine-tuned on the image domain and sinogram domain using\ncontrastive learning to extract semantic descriptions from anatomical\nstructures and metal artifacts. Subsequently, a diffusion model is guided by\nthe embeddings of DuDoCLIP, thereby enabling the dual-domain prior generation.\nAdditionally, we design prompt engineering for more precise image-text\ndescriptions that can enhance the model's perception capability. Then, a\ndownstream task is devised for the one-step residual optimization and\nintegration of dual-domain priors, while incorporating raw data fidelity.\nUltimately, a new perceptual indicator is proposed to validate the model's\nperception and generation performance. With the assistance of DuDoCLIP, our\nDuDoCROP exhibits at least 63.7% higher generalization capability compared to\nthe baseline model. Numerical experiments demonstrate that the proposed method\ncan generate more realistic image structures and outperform other SOTA\napproaches both qualitatively and quantitatively.","PeriodicalId":501378,"journal":{"name":"arXiv - PHYS - Medical Physics","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DuDoCROP: Dual-Domain CLIP-Assisted Residual Optimization Perception Model for CT Metal Artifact Reduction\",\"authors\":\"Xinrui Zhang, Ailong Cai, Lei Li, Bin Yan\",\"doi\":\"arxiv-2408.14342\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Metal artifacts in computed tomography (CT) imaging pose significant\\nchallenges to accurate clinical diagnosis. The presence of high-density\\nmetallic implants results in artifacts that deteriorate image quality,\\nmanifesting in the forms of streaking, blurring, or beam hardening effects,\\netc. Nowadays, various deep learning-based approaches, particularly generative\\nmodels, have been proposed for metal artifact reduction (MAR). However, these\\nmethods have limited perception ability in the diverse morphologies of\\ndifferent metal implants with artifacts, which may generate spurious anatomical\\nstructures and exhibit inferior generalization capability. To address the\\nissues, we leverage visual-language model (VLM) to identify these morphological\\nfeatures and introduce them into a dual-domain CLIP-assisted residual\\noptimization perception model (DuDoCROP) for MAR. Specifically, a dual-domain\\nCLIP (DuDoCLIP) is fine-tuned on the image domain and sinogram domain using\\ncontrastive learning to extract semantic descriptions from anatomical\\nstructures and metal artifacts. Subsequently, a diffusion model is guided by\\nthe embeddings of DuDoCLIP, thereby enabling the dual-domain prior generation.\\nAdditionally, we design prompt engineering for more precise image-text\\ndescriptions that can enhance the model's perception capability. Then, a\\ndownstream task is devised for the one-step residual optimization and\\nintegration of dual-domain priors, while incorporating raw data fidelity.\\nUltimately, a new perceptual indicator is proposed to validate the model's\\nperception and generation performance. With the assistance of DuDoCLIP, our\\nDuDoCROP exhibits at least 63.7% higher generalization capability compared to\\nthe baseline model. Numerical experiments demonstrate that the proposed method\\ncan generate more realistic image structures and outperform other SOTA\\napproaches both qualitatively and quantitatively.\",\"PeriodicalId\":501378,\"journal\":{\"name\":\"arXiv - PHYS - Medical Physics\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Medical Physics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.14342\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Medical Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
计算机断层扫描(CT)成像中的金属伪影给准确的临床诊断带来了巨大挑战。高密度金属植入物的存在会导致伪影,从而降低图像质量,表现为条纹、模糊或光束硬化效应等形式。目前,已有多种基于深度学习的方法,特别是生成模型,被提出用于减少金属伪影(MAR)。然而,这些方法对不同金属植入物的不同形态与伪影的感知能力有限,可能会产生虚假的解剖结构,并表现出较低的泛化能力。为了解决这些问题,我们利用视觉语言模型(VLM)来识别这些形态特征,并将其引入用于 MAR 的双域 CLIP 辅助残余优化感知模型(DuDoCROP)中。具体来说,利用对比学习技术对图像域和正弦波域的双域CLIP(DuDoCLIP)进行微调,以从解剖结构和金属伪影中提取语义描述。此外,我们还为更精确的图像文本描述设计了提示工程,以增强模型的感知能力。最后,我们提出了一个新的感知指标来验证模型的感知和生成性能。在 DuDoCLIP 的帮助下,我们的 DuDoCROP 与基线模型相比至少提高了 63.7% 的泛化能力。数值实验证明,所提出的方法可以生成更逼真的图像结构,在质量和数量上都优于其他 SOTA 方法。
DuDoCROP: Dual-Domain CLIP-Assisted Residual Optimization Perception Model for CT Metal Artifact Reduction
Metal artifacts in computed tomography (CT) imaging pose significant
challenges to accurate clinical diagnosis. The presence of high-density
metallic implants results in artifacts that deteriorate image quality,
manifesting in the forms of streaking, blurring, or beam hardening effects,
etc. Nowadays, various deep learning-based approaches, particularly generative
models, have been proposed for metal artifact reduction (MAR). However, these
methods have limited perception ability in the diverse morphologies of
different metal implants with artifacts, which may generate spurious anatomical
structures and exhibit inferior generalization capability. To address the
issues, we leverage visual-language model (VLM) to identify these morphological
features and introduce them into a dual-domain CLIP-assisted residual
optimization perception model (DuDoCROP) for MAR. Specifically, a dual-domain
CLIP (DuDoCLIP) is fine-tuned on the image domain and sinogram domain using
contrastive learning to extract semantic descriptions from anatomical
structures and metal artifacts. Subsequently, a diffusion model is guided by
the embeddings of DuDoCLIP, thereby enabling the dual-domain prior generation.
Additionally, we design prompt engineering for more precise image-text
descriptions that can enhance the model's perception capability. Then, a
downstream task is devised for the one-step residual optimization and
integration of dual-domain priors, while incorporating raw data fidelity.
Ultimately, a new perceptual indicator is proposed to validate the model's
perception and generation performance. With the assistance of DuDoCLIP, our
DuDoCROP exhibits at least 63.7% higher generalization capability compared to
the baseline model. Numerical experiments demonstrate that the proposed method
can generate more realistic image structures and outperform other SOTA
approaches both qualitatively and quantitatively.