Nils Peters, Eri Haneda, Jiayong Zhang, Grigorios Karageorgos, Wenjun Xia, Joost Verburg, Ge Wang, Harald Paganetti, Bruno De Man
{"title":"A hybrid training database and evaluation benchmark for assessing metal artifact reduction methods for X-ray CT imaging.","authors":"Nils Peters, Eri Haneda, Jiayong Zhang, Grigorios Karageorgos, Wenjun Xia, Joost Verburg, Ge Wang, Harald Paganetti, Bruno De Man","doi":"10.1002/mp.70020","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Metal artifacts significantly degrade the image quality in computed tomography (CT) imaging, obscuring or even feigning pathology. While many different algorithms for metal artifact reduction (MAR) have been proposed, no comprehensive, clinically relevant evaluation benchmark exists. A major contributing factor to this is the lack of artifact-free ground truth data in clinical cases. Similarly, deep-learning based algorithms are hindered by the lack of paired training datasets with and without artifacts.</p><p><strong>Purpose: </strong>We propose the simulation of a large training database for deep-learning based MAR algorithms as well as the definition of a comprehensive evaluation benchmark for MAR. For this we utilize and validate a framework for the realistic simulation of metal artifacts on clinical CT data.</p><p><strong>Methods: </strong>A clinical and a generic CT scanner geometry is modelled in the CatSim CT simulator within the open-access toolkit XCIST. Since most MAR research is performed in 2D, all datasets are simulated in 2D. The metal artifact simulation capability is experimentally validated in CT phantom scans containing various metal types and -geometries. The tool is then used to simulate metal artifact scenarios as training data for deep-learning algorithms utilizing two public CT databases. Lastly, a benchmark is defined for clinically realistic metal artifact scenarios and applied to a numerical and a deep-learning based MAR algorithm, respectively.</p><p><strong>Results: </strong>Within specified regions of interest, the mean CT number deviation between simulation and real data was less than 2%, making the simulation tool suitable for the aspired tasks. In total, 14,000 metal scenarios in the head, thorax and pelvis regions were simulated. For the clinical benchmark, a set of metrics covering CT number accuracy, noise, image sharpness, streak amplitude, structural integrity, and the effect on range in proton therapy, were defined for a range of clinical scenarios. Metal scenarios covered the most relevant clinical use cases, covering small metal implants such as fiducial markers up to large metal implants such as hip replacements. Both the simulation tools and the benchmark with the test cases were made publicly available.</p><p><strong>Conclusions: </strong>We developed and distributed tools and datasets for the development and evaluation of MAR algorithms. This is the first comprehensive evaluation benchmark covering a large number of clinically realistic metal artifact scenarios.</p>","PeriodicalId":94136,"journal":{"name":"Medical physics","volume":"52 10","pages":"e70020"},"PeriodicalIF":3.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/mp.70020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Metal artifacts significantly degrade the image quality in computed tomography (CT) imaging, obscuring or even feigning pathology. While many different algorithms for metal artifact reduction (MAR) have been proposed, no comprehensive, clinically relevant evaluation benchmark exists. A major contributing factor to this is the lack of artifact-free ground truth data in clinical cases. Similarly, deep-learning based algorithms are hindered by the lack of paired training datasets with and without artifacts.
Purpose: We propose the simulation of a large training database for deep-learning based MAR algorithms as well as the definition of a comprehensive evaluation benchmark for MAR. For this we utilize and validate a framework for the realistic simulation of metal artifacts on clinical CT data.
Methods: A clinical and a generic CT scanner geometry is modelled in the CatSim CT simulator within the open-access toolkit XCIST. Since most MAR research is performed in 2D, all datasets are simulated in 2D. The metal artifact simulation capability is experimentally validated in CT phantom scans containing various metal types and -geometries. The tool is then used to simulate metal artifact scenarios as training data for deep-learning algorithms utilizing two public CT databases. Lastly, a benchmark is defined for clinically realistic metal artifact scenarios and applied to a numerical and a deep-learning based MAR algorithm, respectively.
Results: Within specified regions of interest, the mean CT number deviation between simulation and real data was less than 2%, making the simulation tool suitable for the aspired tasks. In total, 14,000 metal scenarios in the head, thorax and pelvis regions were simulated. For the clinical benchmark, a set of metrics covering CT number accuracy, noise, image sharpness, streak amplitude, structural integrity, and the effect on range in proton therapy, were defined for a range of clinical scenarios. Metal scenarios covered the most relevant clinical use cases, covering small metal implants such as fiducial markers up to large metal implants such as hip replacements. Both the simulation tools and the benchmark with the test cases were made publicly available.
Conclusions: We developed and distributed tools and datasets for the development and evaluation of MAR algorithms. This is the first comprehensive evaluation benchmark covering a large number of clinically realistic metal artifact scenarios.