Improving annotation efficiency for fully labeling a breast mass segmentation dataset.

IF 1.7 Q3 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

Journal of Medical Imaging Pub Date : 2025-05-01 Epub Date: 2025-05-21 DOI:10.1117/1.JMI.12.3.035501

Vaibhav Sharma, Alina Jade Barnett, Julia Yang, Sangwook Cheon, Giyoung Kim, Fides Regina Schwartz, Avivah Wang, Neal Hall, Lars Grimm, Chaofan Chen, Joseph Y Lo, Cynthia Rudin

{"title":"Improving annotation efficiency for fully labeling a breast mass segmentation dataset.","authors":"Vaibhav Sharma, Alina Jade Barnett, Julia Yang, Sangwook Cheon, Giyoung Kim, Fides Regina Schwartz, Avivah Wang, Neal Hall, Lars Grimm, Chaofan Chen, Joseph Y Lo, Cynthia Rudin","doi":"10.1117/1.JMI.12.3.035501","DOIUrl":null,"url":null,"abstract":"Purpose: Breast cancer remains a leading cause of death for women. Screening programs are deployed to detect cancer at early stages. One current barrier identified by breast imaging researchers is a shortage of labeled image datasets. Addressing this problem is crucial to improve early detection models. We present an active learning (AL) framework for segmenting breast masses from 2D digital mammography, and we publish labeled data. Our method aims to reduce the input needed from expert annotators to reach a fully labeled dataset.Approach: We create a dataset of 1136 mammographic masses with pixel-wise binary segmentation labels, with the test subset labeled independently by two different teams. With this dataset, we simulate a human annotator within an AL framework to develop and compare AI-assisted labeling methods, using a discriminator model and a simulated oracle to collect acceptable segmentation labels. A UNet model is retrained on these labels, generating new segmentations. We evaluate various oracle heuristics using the percentage of segmentations that the oracle relabels and measure the quality of the proposed labels by evaluating the intersection over union over a validation dataset.Results: Our method reduces expert annotator input by 44%. We present a dataset of 1136 binary segmentation labels approved by board-certified radiologists and make the 143-image validation set public for comparison with other researchers' methods.Conclusions: We demonstrate that AL can significantly improve the efficiency and time-effectiveness of creating labeled mammogram datasets. Our framework facilitates the development of high-quality datasets while minimizing manual effort in the domain of digital mammography.","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 3","pages":"035501"},"PeriodicalIF":1.7000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12094908/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.3.035501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/21 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose: Breast cancer remains a leading cause of death for women. Screening programs are deployed to detect cancer at early stages. One current barrier identified by breast imaging researchers is a shortage of labeled image datasets. Addressing this problem is crucial to improve early detection models. We present an active learning (AL) framework for segmenting breast masses from 2D digital mammography, and we publish labeled data. Our method aims to reduce the input needed from expert annotators to reach a fully labeled dataset.

Approach: We create a dataset of 1136 mammographic masses with pixel-wise binary segmentation labels, with the test subset labeled independently by two different teams. With this dataset, we simulate a human annotator within an AL framework to develop and compare AI-assisted labeling methods, using a discriminator model and a simulated oracle to collect acceptable segmentation labels. A UNet model is retrained on these labels, generating new segmentations. We evaluate various oracle heuristics using the percentage of segmentations that the oracle relabels and measure the quality of the proposed labels by evaluating the intersection over union over a validation dataset.

Results: Our method reduces expert annotator input by 44%. We present a dataset of 1136 binary segmentation labels approved by board-certified radiologists and make the 143-image validation set public for comparison with other researchers' methods.

Conclusions: We demonstrate that AL can significantly improve the efficiency and time-effectiveness of creating labeled mammogram datasets. Our framework facilitates the development of high-quality datasets while minimizing manual effort in the domain of digital mammography.

查看原文本刊更多论文

提高乳腺质量分割数据的标注效率。

目的：乳腺癌仍然是妇女死亡的主要原因。筛查项目被用于在早期阶段发现癌症。目前乳房成像研究人员发现的一个障碍是缺乏标记的图像数据集。解决这个问题对于改进早期检测模型至关重要。我们提出了一个主动学习（AL）框架，用于从2D数字乳房x线摄影中分割乳房肿块，并发布了标记数据。我们的方法旨在减少专家注释者所需的输入，以达到完全标记的数据集。方法：我们创建了一个包含1136个乳腺肿块的数据集，其中包含逐像素的二值分割标签，测试子集由两个不同的团队独立标记。有了这个数据集，我们在一个人工智能框架内模拟了一个人类注释器，以开发和比较人工智能辅助标注方法，使用鉴别器模型和模拟oracle来收集可接受的分割标签。在这些标签上重新训练UNet模型，生成新的分割。我们使用oracle重新标记的分割百分比来评估各种oracle启发式方法，并通过评估验证数据集上的交集与并集来衡量建议标签的质量。结果：我们的方法将专家注释者的输入减少了44%。我们提出了1136个经委员会认证的放射科医生批准的二值分割标签的数据集，并将143个图像验证集公开，以便与其他研究人员的方法进行比较。结论：我们证明人工智能可以显著提高创建标记乳房x线照片数据集的效率和时效性。我们的框架促进了高质量数据集的开发，同时最大限度地减少了数字乳房x线摄影领域的人工工作量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-

CiteScore

4.10

自引率

4.20%

发文量

期刊介绍： JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.