Jupeng Zhang , Qi Wu , Jinhua Hu , Xiqi Zhu , Baosheng Li
{"title":"Performance evaluation of deep learning algorithms in MRI breast lesion segmentation and detection","authors":"Jupeng Zhang , Qi Wu , Jinhua Hu , Xiqi Zhu , Baosheng Li","doi":"10.1016/j.bspc.2025.108853","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>This study systematically evaluates the efficacy of deep learning (DL) algorithms for segmenting and detecting breast lesions in magnetic resonance imaging (MRI), focusing on segmentation accuracy and clinical applicability.</div></div><div><h3>Methods</h3><div>Following PRISMA-DTA guidelines, we searched PubMed, Embase, Scopus, and Web of Science, identifying 19 eligible studies. Inclusion criteria included MRI studies using DL for breast lesion segmentation and detection, with comprehensive data on segmentation efficacy. Study quality was assessed using QUADAS-AI. Meta-analysis was performed using random-effects modeling, with segmentation accuracy quantified by the Dice similarity coefficient (DSC) and lesion detection efficacy by sensitivity. Heterogeneity was explored through <em>meta</em>-regression and subgroup analysis.</div></div><div><h3>Results</h3><div>The 19 studies evaluated DL algorithms like U-Net, nnU-Net, and CNN. DSC for segmentation ranged from 0.61 to 0.97, with a pooled DSC of 0.82 (95 % CI: 0.76–0.88). Pooled sensitivity across six studies was 0.86 (95 % CI: 0.75–0.98). Subgroup analyses showed higher accuracy in multicenter studies (0.86 vs. 0.80), studies with external validation (0.89 vs. 0.79), and 3.0 T MRI devices (0.88 vs. 0.83). Intensity normalization also improved accuracy (0.87 vs. 0.79). nnU-Net achieved the highest DSC (0.97). Significant heterogeneity (I<sup>2</sup> = 99.6 %) and publication bias (p = 0.018) were observed.</div></div><div><h3>Conclusion</h3><div>DL algorithms show high accuracy in breast lesion segmentation and detection, particularly in multicenter studies and those with external validation. Future research should optimize algorithms to reduce heterogeneity and validate clinical applicability.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"112 ","pages":"Article 108853"},"PeriodicalIF":4.9000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809425013643","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
This study systematically evaluates the efficacy of deep learning (DL) algorithms for segmenting and detecting breast lesions in magnetic resonance imaging (MRI), focusing on segmentation accuracy and clinical applicability.
Methods
Following PRISMA-DTA guidelines, we searched PubMed, Embase, Scopus, and Web of Science, identifying 19 eligible studies. Inclusion criteria included MRI studies using DL for breast lesion segmentation and detection, with comprehensive data on segmentation efficacy. Study quality was assessed using QUADAS-AI. Meta-analysis was performed using random-effects modeling, with segmentation accuracy quantified by the Dice similarity coefficient (DSC) and lesion detection efficacy by sensitivity. Heterogeneity was explored through meta-regression and subgroup analysis.
Results
The 19 studies evaluated DL algorithms like U-Net, nnU-Net, and CNN. DSC for segmentation ranged from 0.61 to 0.97, with a pooled DSC of 0.82 (95 % CI: 0.76–0.88). Pooled sensitivity across six studies was 0.86 (95 % CI: 0.75–0.98). Subgroup analyses showed higher accuracy in multicenter studies (0.86 vs. 0.80), studies with external validation (0.89 vs. 0.79), and 3.0 T MRI devices (0.88 vs. 0.83). Intensity normalization also improved accuracy (0.87 vs. 0.79). nnU-Net achieved the highest DSC (0.97). Significant heterogeneity (I2 = 99.6 %) and publication bias (p = 0.018) were observed.
Conclusion
DL algorithms show high accuracy in breast lesion segmentation and detection, particularly in multicenter studies and those with external validation. Future research should optimize algorithms to reduce heterogeneity and validate clinical applicability.
期刊介绍:
Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management.
Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.