Vaibhav Sharma, Alina Jade Barnett, Julia Yang, Sangwook Cheon, Giyoung Kim, Fides Regina Schwartz, Avivah Wang, Neal Hall, Lars Grimm, Chaofan Chen, Joseph Y Lo, Cynthia Rudin
{"title":"提高乳腺质量分割数据的标注效率。","authors":"Vaibhav Sharma, Alina Jade Barnett, Julia Yang, Sangwook Cheon, Giyoung Kim, Fides Regina Schwartz, Avivah Wang, Neal Hall, Lars Grimm, Chaofan Chen, Joseph Y Lo, Cynthia Rudin","doi":"10.1117/1.JMI.12.3.035501","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Breast cancer remains a leading cause of death for women. Screening programs are deployed to detect cancer at early stages. One current barrier identified by breast imaging researchers is a shortage of labeled image datasets. Addressing this problem is crucial to improve early detection models. We present an active learning (AL) framework for segmenting breast masses from 2D digital mammography, and we publish labeled data. Our method aims to reduce the input needed from expert annotators to reach a fully labeled dataset.</p><p><strong>Approach: </strong>We create a dataset of 1136 mammographic masses with pixel-wise binary segmentation labels, with the test subset labeled independently by two different teams. With this dataset, we simulate a human annotator within an AL framework to develop and compare AI-assisted labeling methods, using a discriminator model and a simulated oracle to collect acceptable segmentation labels. A UNet model is retrained on these labels, generating new segmentations. We evaluate various oracle heuristics using the percentage of segmentations that the oracle relabels and measure the quality of the proposed labels by evaluating the intersection over union over a validation dataset.</p><p><strong>Results: </strong>Our method reduces expert annotator input by 44%. We present a dataset of 1136 binary segmentation labels approved by board-certified radiologists and make the 143-image validation set public for comparison with other researchers' methods.</p><p><strong>Conclusions: </strong>We demonstrate that AL can significantly improve the efficiency and time-effectiveness of creating labeled mammogram datasets. Our framework facilitates the development of high-quality datasets while minimizing manual effort in the domain of digital mammography.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"12 3","pages":"035501"},"PeriodicalIF":1.9000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12094908/pdf/","citationCount":"0","resultStr":"{\"title\":\"Improving annotation efficiency for fully labeling a breast mass segmentation dataset.\",\"authors\":\"Vaibhav Sharma, Alina Jade Barnett, Julia Yang, Sangwook Cheon, Giyoung Kim, Fides Regina Schwartz, Avivah Wang, Neal Hall, Lars Grimm, Chaofan Chen, Joseph Y Lo, Cynthia Rudin\",\"doi\":\"10.1117/1.JMI.12.3.035501\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Breast cancer remains a leading cause of death for women. Screening programs are deployed to detect cancer at early stages. One current barrier identified by breast imaging researchers is a shortage of labeled image datasets. Addressing this problem is crucial to improve early detection models. We present an active learning (AL) framework for segmenting breast masses from 2D digital mammography, and we publish labeled data. Our method aims to reduce the input needed from expert annotators to reach a fully labeled dataset.</p><p><strong>Approach: </strong>We create a dataset of 1136 mammographic masses with pixel-wise binary segmentation labels, with the test subset labeled independently by two different teams. With this dataset, we simulate a human annotator within an AL framework to develop and compare AI-assisted labeling methods, using a discriminator model and a simulated oracle to collect acceptable segmentation labels. A UNet model is retrained on these labels, generating new segmentations. We evaluate various oracle heuristics using the percentage of segmentations that the oracle relabels and measure the quality of the proposed labels by evaluating the intersection over union over a validation dataset.</p><p><strong>Results: </strong>Our method reduces expert annotator input by 44%. We present a dataset of 1136 binary segmentation labels approved by board-certified radiologists and make the 143-image validation set public for comparison with other researchers' methods.</p><p><strong>Conclusions: </strong>We demonstrate that AL can significantly improve the efficiency and time-effectiveness of creating labeled mammogram datasets. Our framework facilitates the development of high-quality datasets while minimizing manual effort in the domain of digital mammography.</p>\",\"PeriodicalId\":47707,\"journal\":{\"name\":\"Journal of Medical Imaging\",\"volume\":\"12 3\",\"pages\":\"035501\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12094908/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1117/1.JMI.12.3.035501\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/5/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.12.3.035501","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/21 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Improving annotation efficiency for fully labeling a breast mass segmentation dataset.
Purpose: Breast cancer remains a leading cause of death for women. Screening programs are deployed to detect cancer at early stages. One current barrier identified by breast imaging researchers is a shortage of labeled image datasets. Addressing this problem is crucial to improve early detection models. We present an active learning (AL) framework for segmenting breast masses from 2D digital mammography, and we publish labeled data. Our method aims to reduce the input needed from expert annotators to reach a fully labeled dataset.
Approach: We create a dataset of 1136 mammographic masses with pixel-wise binary segmentation labels, with the test subset labeled independently by two different teams. With this dataset, we simulate a human annotator within an AL framework to develop and compare AI-assisted labeling methods, using a discriminator model and a simulated oracle to collect acceptable segmentation labels. A UNet model is retrained on these labels, generating new segmentations. We evaluate various oracle heuristics using the percentage of segmentations that the oracle relabels and measure the quality of the proposed labels by evaluating the intersection over union over a validation dataset.
Results: Our method reduces expert annotator input by 44%. We present a dataset of 1136 binary segmentation labels approved by board-certified radiologists and make the 143-image validation set public for comparison with other researchers' methods.
Conclusions: We demonstrate that AL can significantly improve the efficiency and time-effectiveness of creating labeled mammogram datasets. Our framework facilitates the development of high-quality datasets while minimizing manual effort in the domain of digital mammography.
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.