AAPM CT metal artifact reduction grand challenge.

IF 3.2

Medical physics Pub Date : 2025-10-01 DOI:10.1002/mp.70050

Eri Haneda, Nils Peters, Jiayong Zhang, Grigorios Karageorgos, Wenjun Xia, Harald Paganetti, Ge Wang, Yi Guo, Jianhua Ma, Hyoung Suk Park, Kiwan Jeon, Fuxin Fan, Mareike Thies, Bruno De Man

{"title":"AAPM CT metal artifact reduction grand challenge.","authors":"Eri Haneda, Nils Peters, Jiayong Zhang, Grigorios Karageorgos, Wenjun Xia, Harald Paganetti, Ge Wang, Yi Guo, Jianhua Ma, Hyoung Suk Park, Kiwan Jeon, Fuxin Fan, Mareike Thies, Bruno De Man","doi":"10.1002/mp.70050","DOIUrl":null,"url":null,"abstract":"Background: Metal artifact reduction (MAR) is a long-standing challenge in CT imaging. The presence of highly attenuating objects, such as dental fillings, hip prostheses, spinal screws/rods, and gold fiducial markers, can introduce severe streak artifacts in CT images, often reducing their diagnostic value. Existing CT MAR studies typically define their own test cases and evaluation metrics, making it difficult to objectively and comprehensively compare the performance of different MAR methods. There is a widespread need for a universal CT MAR image quality benchmark to evaluate the clinical impact of new MAR methods and compare them to state-of-the-art techniques.Purpose: The goal of the AAPM CT Metal Artifact Reduction (CT-MAR) grand challenge was to create and distribute a clinically representative 2D MAR performance benchmark, and to invite participants to objectively compare the performance of their MAR methods based on this benchmark. A secondary goal was to facilitate MAR development by disseminating a MAR training database and tools. After completion of the grand challenge, the MAR benchmark and the MAR training database will remain publicly accessible for future MAR developments and benchmarking.Methods: Grand challenge participants were invited to submit results for their MAR algorithm. The challenge organizers provided 14,000 CT training datasets generated using a hybrid data simulation framework that combined real patient images-including lung, abdomen, liver, head, and pelvis-with virtual metal objects. Each training dataset included five types of data: CT sinograms (uncorrected and metal-free), CT reconstructed images (uncorrected and metal-free), and metal masks. In the final evaluation phase, 29 clinical uncorrected datasets with metal were provided in both the sinogram and image domains for participants to process with their MAR algorithms. Their results were evaluated using eight clinically relevant image quality metrics. The final ranking was determined and compared to an established normalized metal artifact reduction (NMAR) reference method. Additionally, we conducted a survey to better understand the methodologies used by participants.Results: A total of 106 teams registered for the challenge, with 26 teams completing all phases of the challenge. 92% of these-including all top ten teams-used a deep learning (DL) approach, employing a variety of network architectures such as UNet, ResNet, GAN, diffusion models, and transformers. Additionally, 22% of the teams-including the top three teams-utilized a combination of sinogram- and image-domain approaches. The results showed a broad distribution of the scores. Overall, the competition was marked by diverse methods and a wide range of results, including some truly exceptional results. More than 70% of the teams achieved a better overall score than the popular baseline NMAR method.Conclusions: The CT-MAR grand challenge provided an opportunity to benchmark state-of-the-art MAR algorithms. Our hybrid data generation framework was a powerful tool for simulating large-scale realistic datasets for MAR algorithm development. A clinically relevant universal MAR benchmark offered an objective and meaningful way to compare different approaches. The training data and benchmark were published online to support future MAR development.","PeriodicalId":94136,"journal":{"name":"Medical physics","volume":"52 10","pages":"e70050"},"PeriodicalIF":3.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/mp.70050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Metal artifact reduction (MAR) is a long-standing challenge in CT imaging. The presence of highly attenuating objects, such as dental fillings, hip prostheses, spinal screws/rods, and gold fiducial markers, can introduce severe streak artifacts in CT images, often reducing their diagnostic value. Existing CT MAR studies typically define their own test cases and evaluation metrics, making it difficult to objectively and comprehensively compare the performance of different MAR methods. There is a widespread need for a universal CT MAR image quality benchmark to evaluate the clinical impact of new MAR methods and compare them to state-of-the-art techniques.

Purpose: The goal of the AAPM CT Metal Artifact Reduction (CT-MAR) grand challenge was to create and distribute a clinically representative 2D MAR performance benchmark, and to invite participants to objectively compare the performance of their MAR methods based on this benchmark. A secondary goal was to facilitate MAR development by disseminating a MAR training database and tools. After completion of the grand challenge, the MAR benchmark and the MAR training database will remain publicly accessible for future MAR developments and benchmarking.

Methods: Grand challenge participants were invited to submit results for their MAR algorithm. The challenge organizers provided 14,000 CT training datasets generated using a hybrid data simulation framework that combined real patient images-including lung, abdomen, liver, head, and pelvis-with virtual metal objects. Each training dataset included five types of data: CT sinograms (uncorrected and metal-free), CT reconstructed images (uncorrected and metal-free), and metal masks. In the final evaluation phase, 29 clinical uncorrected datasets with metal were provided in both the sinogram and image domains for participants to process with their MAR algorithms. Their results were evaluated using eight clinically relevant image quality metrics. The final ranking was determined and compared to an established normalized metal artifact reduction (NMAR) reference method. Additionally, we conducted a survey to better understand the methodologies used by participants.

Results: A total of 106 teams registered for the challenge, with 26 teams completing all phases of the challenge. 92% of these-including all top ten teams-used a deep learning (DL) approach, employing a variety of network architectures such as UNet, ResNet, GAN, diffusion models, and transformers. Additionally, 22% of the teams-including the top three teams-utilized a combination of sinogram- and image-domain approaches. The results showed a broad distribution of the scores. Overall, the competition was marked by diverse methods and a wide range of results, including some truly exceptional results. More than 70% of the teams achieved a better overall score than the popular baseline NMAR method.

Conclusions: The CT-MAR grand challenge provided an opportunity to benchmark state-of-the-art MAR algorithms. Our hybrid data generation framework was a powerful tool for simulating large-scale realistic datasets for MAR algorithm development. A clinically relevant universal MAR benchmark offered an objective and meaningful way to compare different approaches. The training data and benchmark were published online to support future MAR development.

查看原文本刊更多论文

AAPM CT金属伪影还原的巨大挑战。

背景：金属伪影还原（MAR）是CT成像中一项长期存在的挑战。高衰减物体的存在，如牙科填充物、髋关节假体、脊柱螺钉/棒和金色基准标记物，会在CT图像中引入严重的条纹伪影，通常会降低其诊断价值。现有的CT mri研究通常定义自己的测试用例和评估指标，难以客观、全面地比较不同mri方法的性能。人们普遍需要一个通用的CT磁共振成像图像质量基准来评估新的磁共振成像方法的临床影响，并将它们与最先进的技术进行比较。目的：AAPM CT金属伪影复位（CT-MAR）大挑战的目标是创建和分发具有临床代表性的二维MAR性能基准，并邀请参与者在此基准的基础上客观比较其MAR方法的性能。第二个目标是通过传播一个海洋资源管理培训数据库和工具来促进海洋资源管理的发展。在完成大挑战之后，MAR基准和MAR训练数据库将继续向公众开放，以供未来的MAR开发和基准测试使用。方法：邀请大挑战参与者提交其MAR算法的结果。挑战赛组织者提供了14000个CT训练数据集，这些数据集使用混合数据模拟框架生成，该框架将真实的患者图像（包括肺、腹部、肝脏、头部和骨盆）与虚拟金属物体相结合。每个训练数据集包括五种类型的数据：CT正弦图（未校正和无金属）、CT重建图像（未校正和无金属）和金属蒙版。在最后的评估阶段，29个临床未校正的金属数据集在正弦图和图像域都被提供给参与者用他们的MAR算法进行处理。他们的结果评估使用八个临床相关的图像质量指标。确定最终的排名，并与已建立的标准化金属伪影减少（NMAR）参考方法进行比较。此外，我们进行了一项调查，以更好地了解参与者使用的方法。结果：共有106个团队报名参加挑战，其中26个团队完成了挑战的所有阶段。其中92%（包括所有排名前十的团队）使用了深度学习（DL）方法，采用了各种网络架构，如UNet、ResNet、GAN、扩散模型和变压器。此外，22%的团队（包括前三名团队）使用了正弦图和图像域方法的组合。结果显示了分数的广泛分布。总的来说，比赛的特点是方法多样，结果广泛，包括一些真正出色的结果。超过70%的团队获得了比流行的基准NMAR方法更好的总分。结论：CT-MAR大挑战为最先进的MAR算法提供了基准测试的机会。我们的混合数据生成框架是模拟大规模真实数据集用于MAR算法开发的强大工具。临床相关的通用MAR基准为比较不同的方法提供了客观而有意义的方法。培训数据和基准在网上发布，以支持未来的MAR发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Medical physics

自引率

0.00%

发文量