Automated assessment of task-based performance of digital mammography and tomosynthesis systems using an anthropomorphic breast phantom and deep learning-based scoring.
IF 1.9 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Andrey Makeev, Kaiyan Li, Mark A Anastasio, Arthur Emig, Paul Jahnke, Stephen J Glick
{"title":"Automated assessment of task-based performance of digital mammography and tomosynthesis systems using an anthropomorphic breast phantom and deep learning-based scoring.","authors":"Andrey Makeev, Kaiyan Li, Mark A Anastasio, Arthur Emig, Paul Jahnke, Stephen J Glick","doi":"10.1117/1.JMI.12.S1.S13005","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Conventional metrics used for assessing digital mammography (DM) and digital breast tomosynthesis (DBT) image quality, including noise, spatial resolution, and detective quantum efficiency, do not necessarily predict how well the system will perform in a clinical task. A number of existing phantom-based methods have their own limitations, such as unrealistic uniform backgrounds, subjective scoring using humans, and regular signal patterns unrepresentative of common clinical findings. We attempted to address this problem with a realistic breast phantom with random hydroxyapatite microcalcifications and semi-automated deep learning-based image scoring. Our goal was to develop a methodology for objective task-based assessment of image quality for tomosynthesis and DM systems, which includes an anthropomorphic phantom, a detection task (microcalcification clusters), and automated performance evaluation using a convolutional neural network.</p><p><strong>Approach: </strong>Experimental 2D and pseudo-3D mammograms of an anthropomorphic inkjet-printed breast phantom with inserted microcalcification clusters were collected on clinical mammography systems to train a signal-present/signal-absent image classifier based on Resnet-18 architecture. In a separate validation study using simulations, this Resnet-18 classifier was shown to approach the performance of an ideal observer. Microcalcification detection performance was evaluated as a function of four dose levels using receiver operating characteristic (ROC) analysis [i.e., area under the ROC curve (AUC)]. To demonstrate the use of this evaluation approach for assessing different technologies, the method was applied to two different mammography systems, as well as to mammograms with re-binned pixels emulating a lower-resolution X-ray detector.</p><p><strong>Results: </strong>Microcalcification detectability, as assessed by the deep learning classifier, was observed to vary with the exposure incident on the breast phantom for both DM and tomosynthesis. At full dose, experimental AUC was 0.96 (for DM) and 0.95 (for DBT), whereas at half dose, it dropped to 0.85 and 0.71, respectively. AUC performance on DM was significantly decreased with an effective larger pixel size obtained with re-binning. The task-based assessment approach also showed the superiority of a newer mammography system compared with an older system.</p><p><strong>Conclusions: </strong>An objective task-based methodology for assessing the image quality of mammography and tomosynthesis systems is proposed. Possible uses for this tool could be quality control, acceptance, and constancy testing, assessing the safety and effectiveness of new technology for regulatory submissions, and system optimization. The results from this study showed that the proposed evaluation method using a deep learning model observer can track differences in microcalcification signal detectability with varied exposure conditions.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":null,"pages":null},"PeriodicalIF":1.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11474246/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.S1.S13005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/15 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Conventional metrics used for assessing digital mammography (DM) and digital breast tomosynthesis (DBT) image quality, including noise, spatial resolution, and detective quantum efficiency, do not necessarily predict how well the system will perform in a clinical task. A number of existing phantom-based methods have their own limitations, such as unrealistic uniform backgrounds, subjective scoring using humans, and regular signal patterns unrepresentative of common clinical findings. We attempted to address this problem with a realistic breast phantom with random hydroxyapatite microcalcifications and semi-automated deep learning-based image scoring. Our goal was to develop a methodology for objective task-based assessment of image quality for tomosynthesis and DM systems, which includes an anthropomorphic phantom, a detection task (microcalcification clusters), and automated performance evaluation using a convolutional neural network.
Approach: Experimental 2D and pseudo-3D mammograms of an anthropomorphic inkjet-printed breast phantom with inserted microcalcification clusters were collected on clinical mammography systems to train a signal-present/signal-absent image classifier based on Resnet-18 architecture. In a separate validation study using simulations, this Resnet-18 classifier was shown to approach the performance of an ideal observer. Microcalcification detection performance was evaluated as a function of four dose levels using receiver operating characteristic (ROC) analysis [i.e., area under the ROC curve (AUC)]. To demonstrate the use of this evaluation approach for assessing different technologies, the method was applied to two different mammography systems, as well as to mammograms with re-binned pixels emulating a lower-resolution X-ray detector.
Results: Microcalcification detectability, as assessed by the deep learning classifier, was observed to vary with the exposure incident on the breast phantom for both DM and tomosynthesis. At full dose, experimental AUC was 0.96 (for DM) and 0.95 (for DBT), whereas at half dose, it dropped to 0.85 and 0.71, respectively. AUC performance on DM was significantly decreased with an effective larger pixel size obtained with re-binning. The task-based assessment approach also showed the superiority of a newer mammography system compared with an older system.
Conclusions: An objective task-based methodology for assessing the image quality of mammography and tomosynthesis systems is proposed. Possible uses for this tool could be quality control, acceptance, and constancy testing, assessing the safety and effectiveness of new technology for regulatory submissions, and system optimization. The results from this study showed that the proposed evaluation method using a deep learning model observer can track differences in microcalcification signal detectability with varied exposure conditions.
目的:用于评估数字乳腺 X 射线照相术(DM)和数字乳腺断层合成术(DBT)图像质量的传统指标,包括噪声、空间分辨率和检测量子效率,并不一定能预测系统在临床任务中的表现。现有的一些基于模型的方法有其自身的局限性,如不现实的均匀背景、人的主观评分以及不能代表常见临床发现的常规信号模式。我们试图通过一个具有随机羟基磷灰石微钙化的真实乳腺模型和基于深度学习的半自动图像评分来解决这个问题。我们的目标是为断层合成和 DM 系统开发一种基于任务的客观图像质量评估方法,其中包括拟人化模型、检测任务(微钙化簇)和使用卷积神经网络的自动性能评估:方法:在临床乳腺X光摄影系统上收集了插入微钙化簇的拟人喷墨打印乳房模型的实验性二维和伪三维乳房X光照片,以训练基于Resnet-18架构的信号存在/信号不存在图像分类器。在一项单独的模拟验证研究中,Resnet-18 分类器的性能接近理想观察者。使用接收者操作特征(ROC)分析(即 ROC 曲线下面积(AUC))将微钙化检测性能作为四个剂量水平的函数进行评估。为了证明这种评估方法可用于评估不同的技术,我们将该方法应用于两种不同的乳腺 X 射线摄影系统,以及模拟低分辨率 X 射线探测器的重新分档像素乳腺 X 射线照片:结果:深度学习分类器评估的微钙化可探测性随DM和断层扫描乳腺模型的曝光量而变化。在全剂量时,实验AUC分别为0.96(DM)和0.95(DBT),而在半剂量时,AUC分别降至0.85和0.71。通过重新分选获得更大的有效像素尺寸后,DM 的 AUC 性能明显下降。基于任务的评估方法还显示,较新的乳腺 X 射线摄影系统优于较旧的系统:结论:本文提出了一种基于任务的客观方法,用于评估乳腺 X 射线摄影和断层扫描系统的图像质量。该工具可用于质量控制、验收和恒定性测试、评估新技术的安全性和有效性以提交监管申请以及系统优化。研究结果表明,使用深度学习模型观察者的评估方法可以跟踪不同曝光条件下微钙化信号可探测性的差异。
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.