Eri Haneda, Nils Peters, Jiayong Zhang, Grigorios Karageorgos, Wenjun Xia, Harald Paganetti, Ge Wang, Yi Guo, Jianhua Ma, Hyoung Suk Park, Kiwan Jeon, Fuxin Fan, Mareike Thies, Bruno De Man
{"title":"AAPM CT metal artifact reduction grand challenge.","authors":"Eri Haneda, Nils Peters, Jiayong Zhang, Grigorios Karageorgos, Wenjun Xia, Harald Paganetti, Ge Wang, Yi Guo, Jianhua Ma, Hyoung Suk Park, Kiwan Jeon, Fuxin Fan, Mareike Thies, Bruno De Man","doi":"10.1002/mp.70050","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Metal artifact reduction (MAR) is a long-standing challenge in CT imaging. The presence of highly attenuating objects, such as dental fillings, hip prostheses, spinal screws/rods, and gold fiducial markers, can introduce severe streak artifacts in CT images, often reducing their diagnostic value. Existing CT MAR studies typically define their own test cases and evaluation metrics, making it difficult to objectively and comprehensively compare the performance of different MAR methods. There is a widespread need for a universal CT MAR image quality benchmark to evaluate the clinical impact of new MAR methods and compare them to state-of-the-art techniques.</p><p><strong>Purpose: </strong>The goal of the AAPM CT Metal Artifact Reduction (CT-MAR) grand challenge was to create and distribute a clinically representative 2D MAR performance benchmark, and to invite participants to objectively compare the performance of their MAR methods based on this benchmark. A secondary goal was to facilitate MAR development by disseminating a MAR training database and tools. After completion of the grand challenge, the MAR benchmark and the MAR training database will remain publicly accessible for future MAR developments and benchmarking.</p><p><strong>Methods: </strong>Grand challenge participants were invited to submit results for their MAR algorithm. The challenge organizers provided 14,000 CT training datasets generated using a hybrid data simulation framework that combined real patient images-including lung, abdomen, liver, head, and pelvis-with virtual metal objects. Each training dataset included five types of data: CT sinograms (uncorrected and metal-free), CT reconstructed images (uncorrected and metal-free), and metal masks. In the final evaluation phase, 29 clinical uncorrected datasets with metal were provided in both the sinogram and image domains for participants to process with their MAR algorithms. Their results were evaluated using eight clinically relevant image quality metrics. The final ranking was determined and compared to an established normalized metal artifact reduction (NMAR) reference method. Additionally, we conducted a survey to better understand the methodologies used by participants.</p><p><strong>Results: </strong>A total of 106 teams registered for the challenge, with 26 teams completing all phases of the challenge. 92% of these-including all top ten teams-used a deep learning (DL) approach, employing a variety of network architectures such as UNet, ResNet, GAN, diffusion models, and transformers. Additionally, 22% of the teams-including the top three teams-utilized a combination of sinogram- and image-domain approaches. The results showed a broad distribution of the scores. Overall, the competition was marked by diverse methods and a wide range of results, including some truly exceptional results. More than 70% of the teams achieved a better overall score than the popular baseline NMAR method.</p><p><strong>Conclusions: </strong>The CT-MAR grand challenge provided an opportunity to benchmark state-of-the-art MAR algorithms. Our hybrid data generation framework was a powerful tool for simulating large-scale realistic datasets for MAR algorithm development. A clinically relevant universal MAR benchmark offered an objective and meaningful way to compare different approaches. The training data and benchmark were published online to support future MAR development.</p>","PeriodicalId":94136,"journal":{"name":"Medical physics","volume":"52 10","pages":"e70050"},"PeriodicalIF":3.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/mp.70050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Metal artifact reduction (MAR) is a long-standing challenge in CT imaging. The presence of highly attenuating objects, such as dental fillings, hip prostheses, spinal screws/rods, and gold fiducial markers, can introduce severe streak artifacts in CT images, often reducing their diagnostic value. Existing CT MAR studies typically define their own test cases and evaluation metrics, making it difficult to objectively and comprehensively compare the performance of different MAR methods. There is a widespread need for a universal CT MAR image quality benchmark to evaluate the clinical impact of new MAR methods and compare them to state-of-the-art techniques.
Purpose: The goal of the AAPM CT Metal Artifact Reduction (CT-MAR) grand challenge was to create and distribute a clinically representative 2D MAR performance benchmark, and to invite participants to objectively compare the performance of their MAR methods based on this benchmark. A secondary goal was to facilitate MAR development by disseminating a MAR training database and tools. After completion of the grand challenge, the MAR benchmark and the MAR training database will remain publicly accessible for future MAR developments and benchmarking.
Methods: Grand challenge participants were invited to submit results for their MAR algorithm. The challenge organizers provided 14,000 CT training datasets generated using a hybrid data simulation framework that combined real patient images-including lung, abdomen, liver, head, and pelvis-with virtual metal objects. Each training dataset included five types of data: CT sinograms (uncorrected and metal-free), CT reconstructed images (uncorrected and metal-free), and metal masks. In the final evaluation phase, 29 clinical uncorrected datasets with metal were provided in both the sinogram and image domains for participants to process with their MAR algorithms. Their results were evaluated using eight clinically relevant image quality metrics. The final ranking was determined and compared to an established normalized metal artifact reduction (NMAR) reference method. Additionally, we conducted a survey to better understand the methodologies used by participants.
Results: A total of 106 teams registered for the challenge, with 26 teams completing all phases of the challenge. 92% of these-including all top ten teams-used a deep learning (DL) approach, employing a variety of network architectures such as UNet, ResNet, GAN, diffusion models, and transformers. Additionally, 22% of the teams-including the top three teams-utilized a combination of sinogram- and image-domain approaches. The results showed a broad distribution of the scores. Overall, the competition was marked by diverse methods and a wide range of results, including some truly exceptional results. More than 70% of the teams achieved a better overall score than the popular baseline NMAR method.
Conclusions: The CT-MAR grand challenge provided an opportunity to benchmark state-of-the-art MAR algorithms. Our hybrid data generation framework was a powerful tool for simulating large-scale realistic datasets for MAR algorithm development. A clinically relevant universal MAR benchmark offered an objective and meaningful way to compare different approaches. The training data and benchmark were published online to support future MAR development.