Dependence of observer task on conclusions drawn from in silico trials evaluating the performance of full-field digital mammography and digital breast tomosynthesis.

IF 1.9 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Medical Imaging Pub Date : 2025-01-01 Epub Date: 2025-05-19 DOI:10.1117/1.JMI.12.S1.S13014

Dan Li, Andrey Makeev, Stephen J Glick

{"title":"Dependence of observer task on conclusions drawn from in silico trials evaluating the performance of full-field digital mammography and digital breast tomosynthesis.","authors":"Dan Li, Andrey Makeev, Stephen J Glick","doi":"10.1117/1.JMI.12.S1.S13014","DOIUrl":null,"url":null,"abstract":"Purpose: We aim to refine the task-based evaluation of full-field digital mammography (FFDM) and digital breast tomosynthesis (DBT) through in silico trials (ISTs). Previous ISTs mostly employ lesion detection tasks for task-based performance evaluation, which differ from clinical practice where the task normally involves the radiologists both detecting whether a suspicious lesion is present and rating how likely it is that the lesion is malignant. We hypothesize that differing conclusions may result from ISTs based on the defined task.Approach: The shape of the masses was employed as a surrogate indicator for malignancy, with spiculated masses representing malignant lesions and lobular masses representing benign lesions. A convolutional neural network (CNN) model observer was then trained to differentiate between spiculated and nonspiculated masses using Monte Carlo-simulated breast images. This approach leverages prior research demonstrating that CNN-based frameworks can approximate the performance of an ideal observer. We systematically evaluated the effects of varying dose levels, detector pixel size, and projection angular range on the CNN model observer's performance in both detection and classification tasks, assessing the performance of both FFDM and DBT systems.Results: Our findings demonstrate significant variations in conclusions drawn from IST models depending on whether the task is lesion detection or classification. Specifically, we observed that varying average glandular dose levels from 2.0 to 0.5 mGy had little effect on the detection of masses, whereas a small but significant decrease in performance with reduced dose was observed with the classification task across FFDM and DBT. Similarly, reduced spatial resolution resulted in a small but significant decrease in performance with the classification task for FFDM. For DBT ISTs, we also observed that the preferred angular range varies depending on whether the task is detection or classification.Conclusions: Integrating classification tasks into ISTs and potentially physical phantom studies can provide additional information in the evaluation of clinical breast imaging systems. This methodology can enhance the reliability of performance assessments for new breast imaging technologies. Depending on the study's objective, ISTs and physical phantom studies should aim to employ tasks that closely model actual clinical scenarios.","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 Suppl 1","pages":"S13014"},"PeriodicalIF":1.9000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12087637/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.S1.S13014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/19 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: We aim to refine the task-based evaluation of full-field digital mammography (FFDM) and digital breast tomosynthesis (DBT) through in silico trials (ISTs). Previous ISTs mostly employ lesion detection tasks for task-based performance evaluation, which differ from clinical practice where the task normally involves the radiologists both detecting whether a suspicious lesion is present and rating how likely it is that the lesion is malignant. We hypothesize that differing conclusions may result from ISTs based on the defined task.

Approach: The shape of the masses was employed as a surrogate indicator for malignancy, with spiculated masses representing malignant lesions and lobular masses representing benign lesions. A convolutional neural network (CNN) model observer was then trained to differentiate between spiculated and nonspiculated masses using Monte Carlo-simulated breast images. This approach leverages prior research demonstrating that CNN-based frameworks can approximate the performance of an ideal observer. We systematically evaluated the effects of varying dose levels, detector pixel size, and projection angular range on the CNN model observer's performance in both detection and classification tasks, assessing the performance of both FFDM and DBT systems.

Results: Our findings demonstrate significant variations in conclusions drawn from IST models depending on whether the task is lesion detection or classification. Specifically, we observed that varying average glandular dose levels from 2.0 to 0.5 mGy had little effect on the detection of masses, whereas a small but significant decrease in performance with reduced dose was observed with the classification task across FFDM and DBT. Similarly, reduced spatial resolution resulted in a small but significant decrease in performance with the classification task for FFDM. For DBT ISTs, we also observed that the preferred angular range varies depending on whether the task is detection or classification.

Conclusions: Integrating classification tasks into ISTs and potentially physical phantom studies can provide additional information in the evaluation of clinical breast imaging systems. This methodology can enhance the reliability of performance assessments for new breast imaging technologies. Depending on the study's objective, ISTs and physical phantom studies should aim to employ tasks that closely model actual clinical scenarios.

查看原文本刊更多论文

观察者任务依赖于评估全视场数字乳房x线照相术和数字乳房断层合成术性能的计算机试验得出的结论。

目的：我们旨在通过计算机试验（ISTs）完善基于任务的全视场数字乳房x线摄影（FFDM）和数字乳房断层合成（DBT）的评估。以前的ist大多采用病变检测任务来进行基于任务的绩效评估，这与临床实践不同，临床实践中的任务通常包括放射科医生检测是否存在可疑病变并评估病变恶性的可能性。我们假设基于定义任务的ist可能会得出不同的结论。方法：以肿块的形状作为恶性的替代指标，以针状肿块代表恶性病变，小叶肿块代表良性病变。然后使用蒙特卡罗模拟的乳房图像训练卷积神经网络（CNN）模型观测器来区分有毛刺和无毛刺的肿块。该方法利用先前的研究表明，基于cnn的框架可以近似理想观测器的性能。我们系统地评估了不同剂量水平、检测器像素大小和投影角度范围对CNN模型观测器在检测和分类任务中的性能的影响，评估了FFDM和DBT系统的性能。结果：我们的研究结果表明，根据任务是病变检测还是分类，从IST模型得出的结论存在显著差异。具体来说，我们观察到，在2.0至0.5 mGy的腺体平均剂量水平变化对肿块的检测几乎没有影响，而在FFDM和DBT的分类任务中，观察到随着剂量的减少，性能略有但显著下降。同样，空间分辨率的降低也会导致FFDM分类任务性能的小幅但显著的下降。对于DBT列表，我们还观察到偏好的角度范围取决于任务是检测还是分类。结论：将分类任务整合到ist和潜在的物理假体研究中可以为临床乳腺成像系统的评估提供额外的信息。该方法可提高新乳腺成像技术性能评估的可靠性。根据研究的目的，ist和物理幻影研究应旨在采用与实际临床情景密切相关的任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

4.10

自引率

4.20%

发文量

期刊介绍： JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.