DuDoCROP:用于减少 CT 金属伪影的双域 CLIP 辅助残留优化感知模型

Xinrui Zhang, Ailong Cai, Lei Li, Bin Yan
{"title":"DuDoCROP:用于减少 CT 金属伪影的双域 CLIP 辅助残留优化感知模型","authors":"Xinrui Zhang, Ailong Cai, Lei Li, Bin Yan","doi":"arxiv-2408.14342","DOIUrl":null,"url":null,"abstract":"Metal artifacts in computed tomography (CT) imaging pose significant\nchallenges to accurate clinical diagnosis. The presence of high-density\nmetallic implants results in artifacts that deteriorate image quality,\nmanifesting in the forms of streaking, blurring, or beam hardening effects,\netc. Nowadays, various deep learning-based approaches, particularly generative\nmodels, have been proposed for metal artifact reduction (MAR). However, these\nmethods have limited perception ability in the diverse morphologies of\ndifferent metal implants with artifacts, which may generate spurious anatomical\nstructures and exhibit inferior generalization capability. To address the\nissues, we leverage visual-language model (VLM) to identify these morphological\nfeatures and introduce them into a dual-domain CLIP-assisted residual\noptimization perception model (DuDoCROP) for MAR. Specifically, a dual-domain\nCLIP (DuDoCLIP) is fine-tuned on the image domain and sinogram domain using\ncontrastive learning to extract semantic descriptions from anatomical\nstructures and metal artifacts. Subsequently, a diffusion model is guided by\nthe embeddings of DuDoCLIP, thereby enabling the dual-domain prior generation.\nAdditionally, we design prompt engineering for more precise image-text\ndescriptions that can enhance the model's perception capability. Then, a\ndownstream task is devised for the one-step residual optimization and\nintegration of dual-domain priors, while incorporating raw data fidelity.\nUltimately, a new perceptual indicator is proposed to validate the model's\nperception and generation performance. With the assistance of DuDoCLIP, our\nDuDoCROP exhibits at least 63.7% higher generalization capability compared to\nthe baseline model. Numerical experiments demonstrate that the proposed method\ncan generate more realistic image structures and outperform other SOTA\napproaches both qualitatively and quantitatively.","PeriodicalId":501378,"journal":{"name":"arXiv - PHYS - Medical Physics","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DuDoCROP: Dual-Domain CLIP-Assisted Residual Optimization Perception Model for CT Metal Artifact Reduction\",\"authors\":\"Xinrui Zhang, Ailong Cai, Lei Li, Bin Yan\",\"doi\":\"arxiv-2408.14342\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Metal artifacts in computed tomography (CT) imaging pose significant\\nchallenges to accurate clinical diagnosis. The presence of high-density\\nmetallic implants results in artifacts that deteriorate image quality,\\nmanifesting in the forms of streaking, blurring, or beam hardening effects,\\netc. Nowadays, various deep learning-based approaches, particularly generative\\nmodels, have been proposed for metal artifact reduction (MAR). However, these\\nmethods have limited perception ability in the diverse morphologies of\\ndifferent metal implants with artifacts, which may generate spurious anatomical\\nstructures and exhibit inferior generalization capability. To address the\\nissues, we leverage visual-language model (VLM) to identify these morphological\\nfeatures and introduce them into a dual-domain CLIP-assisted residual\\noptimization perception model (DuDoCROP) for MAR. Specifically, a dual-domain\\nCLIP (DuDoCLIP) is fine-tuned on the image domain and sinogram domain using\\ncontrastive learning to extract semantic descriptions from anatomical\\nstructures and metal artifacts. Subsequently, a diffusion model is guided by\\nthe embeddings of DuDoCLIP, thereby enabling the dual-domain prior generation.\\nAdditionally, we design prompt engineering for more precise image-text\\ndescriptions that can enhance the model's perception capability. Then, a\\ndownstream task is devised for the one-step residual optimization and\\nintegration of dual-domain priors, while incorporating raw data fidelity.\\nUltimately, a new perceptual indicator is proposed to validate the model's\\nperception and generation performance. With the assistance of DuDoCLIP, our\\nDuDoCROP exhibits at least 63.7% higher generalization capability compared to\\nthe baseline model. Numerical experiments demonstrate that the proposed method\\ncan generate more realistic image structures and outperform other SOTA\\napproaches both qualitatively and quantitatively.\",\"PeriodicalId\":501378,\"journal\":{\"name\":\"arXiv - PHYS - Medical Physics\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Medical Physics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.14342\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Medical Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.14342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

计算机断层扫描(CT)成像中的金属伪影给准确的临床诊断带来了巨大挑战。高密度金属植入物的存在会导致伪影,从而降低图像质量,表现为条纹、模糊或光束硬化效应等形式。目前,已有多种基于深度学习的方法,特别是生成模型,被提出用于减少金属伪影(MAR)。然而,这些方法对不同金属植入物的不同形态与伪影的感知能力有限,可能会产生虚假的解剖结构,并表现出较低的泛化能力。为了解决这些问题,我们利用视觉语言模型(VLM)来识别这些形态特征,并将其引入用于 MAR 的双域 CLIP 辅助残余优化感知模型(DuDoCROP)中。具体来说,利用对比学习技术对图像域和正弦波域的双域CLIP(DuDoCLIP)进行微调,以从解剖结构和金属伪影中提取语义描述。此外,我们还为更精确的图像文本描述设计了提示工程,以增强模型的感知能力。最后,我们提出了一个新的感知指标来验证模型的感知和生成性能。在 DuDoCLIP 的帮助下,我们的 DuDoCROP 与基线模型相比至少提高了 63.7% 的泛化能力。数值实验证明,所提出的方法可以生成更逼真的图像结构,在质量和数量上都优于其他 SOTA 方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DuDoCROP: Dual-Domain CLIP-Assisted Residual Optimization Perception Model for CT Metal Artifact Reduction
Metal artifacts in computed tomography (CT) imaging pose significant challenges to accurate clinical diagnosis. The presence of high-density metallic implants results in artifacts that deteriorate image quality, manifesting in the forms of streaking, blurring, or beam hardening effects, etc. Nowadays, various deep learning-based approaches, particularly generative models, have been proposed for metal artifact reduction (MAR). However, these methods have limited perception ability in the diverse morphologies of different metal implants with artifacts, which may generate spurious anatomical structures and exhibit inferior generalization capability. To address the issues, we leverage visual-language model (VLM) to identify these morphological features and introduce them into a dual-domain CLIP-assisted residual optimization perception model (DuDoCROP) for MAR. Specifically, a dual-domain CLIP (DuDoCLIP) is fine-tuned on the image domain and sinogram domain using contrastive learning to extract semantic descriptions from anatomical structures and metal artifacts. Subsequently, a diffusion model is guided by the embeddings of DuDoCLIP, thereby enabling the dual-domain prior generation. Additionally, we design prompt engineering for more precise image-text descriptions that can enhance the model's perception capability. Then, a downstream task is devised for the one-step residual optimization and integration of dual-domain priors, while incorporating raw data fidelity. Ultimately, a new perceptual indicator is proposed to validate the model's perception and generation performance. With the assistance of DuDoCLIP, our DuDoCROP exhibits at least 63.7% higher generalization capability compared to the baseline model. Numerical experiments demonstrate that the proposed method can generate more realistic image structures and outperform other SOTA approaches both qualitatively and quantitatively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信