A hybrid training database and evaluation benchmark for assessing metal artifact reduction methods for X-ray CT imaging.

IF 3.2

Medical physics Pub Date : 2025-10-01 DOI:10.1002/mp.70020

Nils Peters, Eri Haneda, Jiayong Zhang, Grigorios Karageorgos, Wenjun Xia, Joost Verburg, Ge Wang, Harald Paganetti, Bruno De Man

{"title":"A hybrid training database and evaluation benchmark for assessing metal artifact reduction methods for X-ray CT imaging.","authors":"Nils Peters, Eri Haneda, Jiayong Zhang, Grigorios Karageorgos, Wenjun Xia, Joost Verburg, Ge Wang, Harald Paganetti, Bruno De Man","doi":"10.1002/mp.70020","DOIUrl":null,"url":null,"abstract":"Background: Metal artifacts significantly degrade the image quality in computed tomography (CT) imaging, obscuring or even feigning pathology. While many different algorithms for metal artifact reduction (MAR) have been proposed, no comprehensive, clinically relevant evaluation benchmark exists. A major contributing factor to this is the lack of artifact-free ground truth data in clinical cases. Similarly, deep-learning based algorithms are hindered by the lack of paired training datasets with and without artifacts.Purpose: We propose the simulation of a large training database for deep-learning based MAR algorithms as well as the definition of a comprehensive evaluation benchmark for MAR. For this we utilize and validate a framework for the realistic simulation of metal artifacts on clinical CT data.Methods: A clinical and a generic CT scanner geometry is modelled in the CatSim CT simulator within the open-access toolkit XCIST. Since most MAR research is performed in 2D, all datasets are simulated in 2D. The metal artifact simulation capability is experimentally validated in CT phantom scans containing various metal types and -geometries. The tool is then used to simulate metal artifact scenarios as training data for deep-learning algorithms utilizing two public CT databases. Lastly, a benchmark is defined for clinically realistic metal artifact scenarios and applied to a numerical and a deep-learning based MAR algorithm, respectively.Results: Within specified regions of interest, the mean CT number deviation between simulation and real data was less than 2%, making the simulation tool suitable for the aspired tasks. In total, 14,000 metal scenarios in the head, thorax and pelvis regions were simulated. For the clinical benchmark, a set of metrics covering CT number accuracy, noise, image sharpness, streak amplitude, structural integrity, and the effect on range in proton therapy, were defined for a range of clinical scenarios. Metal scenarios covered the most relevant clinical use cases, covering small metal implants such as fiducial markers up to large metal implants such as hip replacements. Both the simulation tools and the benchmark with the test cases were made publicly available.Conclusions: We developed and distributed tools and datasets for the development and evaluation of MAR algorithms. This is the first comprehensive evaluation benchmark covering a large number of clinically realistic metal artifact scenarios.","PeriodicalId":94136,"journal":{"name":"Medical physics","volume":"52 10","pages":"e70020"},"PeriodicalIF":3.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/mp.70020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Metal artifacts significantly degrade the image quality in computed tomography (CT) imaging, obscuring or even feigning pathology. While many different algorithms for metal artifact reduction (MAR) have been proposed, no comprehensive, clinically relevant evaluation benchmark exists. A major contributing factor to this is the lack of artifact-free ground truth data in clinical cases. Similarly, deep-learning based algorithms are hindered by the lack of paired training datasets with and without artifacts.

Purpose: We propose the simulation of a large training database for deep-learning based MAR algorithms as well as the definition of a comprehensive evaluation benchmark for MAR. For this we utilize and validate a framework for the realistic simulation of metal artifacts on clinical CT data.

Methods: A clinical and a generic CT scanner geometry is modelled in the CatSim CT simulator within the open-access toolkit XCIST. Since most MAR research is performed in 2D, all datasets are simulated in 2D. The metal artifact simulation capability is experimentally validated in CT phantom scans containing various metal types and -geometries. The tool is then used to simulate metal artifact scenarios as training data for deep-learning algorithms utilizing two public CT databases. Lastly, a benchmark is defined for clinically realistic metal artifact scenarios and applied to a numerical and a deep-learning based MAR algorithm, respectively.

Results: Within specified regions of interest, the mean CT number deviation between simulation and real data was less than 2%, making the simulation tool suitable for the aspired tasks. In total, 14,000 metal scenarios in the head, thorax and pelvis regions were simulated. For the clinical benchmark, a set of metrics covering CT number accuracy, noise, image sharpness, streak amplitude, structural integrity, and the effect on range in proton therapy, were defined for a range of clinical scenarios. Metal scenarios covered the most relevant clinical use cases, covering small metal implants such as fiducial markers up to large metal implants such as hip replacements. Both the simulation tools and the benchmark with the test cases were made publicly available.

Conclusions: We developed and distributed tools and datasets for the development and evaluation of MAR algorithms. This is the first comprehensive evaluation benchmark covering a large number of clinically realistic metal artifact scenarios.

查看原文本刊更多论文

用于评估x射线CT成像中金属伪影减少方法的混合训练数据库和评估基准。

背景：金属伪影在计算机断层扫描（CT）成像中显著降低图像质量，掩盖甚至伪造病理。虽然已经提出了许多不同的金属伪影减少（MAR）算法，但没有一个全面的、临床相关的评估基准。造成这种情况的一个主要因素是在临床病例中缺乏无伪影的地面真实数据。同样，基于深度学习的算法也受到缺乏配对训练数据集的阻碍。目的：我们提出了一个基于深度学习的MAR算法的大型训练数据库的模拟，以及MAR综合评估基准的定义。为此，我们利用并验证了一个框架，用于在临床CT数据上真实模拟金属伪影。方法：在开放获取工具包xist中的CatSim CT模拟器中模拟临床和通用CT扫描仪的几何形状。由于大多数MAR研究是在2D中进行的，因此所有数据集都是在2D中模拟的。金属伪影模拟能力在包含各种金属类型和几何形状的CT幻像扫描中得到了实验验证。然后使用该工具模拟金属伪影场景，作为利用两个公共CT数据库进行深度学习算法的训练数据。最后，定义了临床真实金属伪影场景的基准，并分别应用于基于数值和深度学习的MAR算法。结果：在指定的感兴趣区域内，模拟与真实数据之间的平均CT数偏差小于2%，使模拟工具适合期望的任务。总共模拟了14000个头部、胸部和骨盆区域的金属场景。作为临床基准，针对一系列临床场景定义了一组指标，包括CT数字准确性、噪声、图像清晰度、条纹振幅、结构完整性以及对质子治疗范围的影响。金属场景涵盖了最相关的临床用例，从小型金属植入物如基准标记到大型金属植入物如髋关节置换术。模拟工具和带有测试用例的基准都是公开可用的。结论：我们开发并分发了用于开发和评估MAR算法的工具和数据集。这是第一个涵盖大量临床真实金属假物场景的综合评价基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical physics

自引率

0.00%

发文量