Lung lobe segmentation: performance of open-source MOOSE, TotalSegmentator, and LungMask models compared to a local in-house model.

IF 3.6 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

European Radiology Experimental Pub Date : 2025-09-04 DOI:10.1186/s41747-025-00623-9

Elaheh Amini, Ran Klein

{"title":"Lung lobe segmentation: performance of open-source MOOSE, TotalSegmentator, and LungMask models compared to a local in-house model.","authors":"Elaheh Amini, Ran Klein","doi":"10.1186/s41747-025-00623-9","DOIUrl":null,"url":null,"abstract":"Background: Lung lobe segmentation is required to assess lobar function with nuclear imaging before surgical interventions. We evaluated the performance of open-source deep learning-based lung lobe segmentation tools, compared to a similar nnU-Net model trained on a smaller but more representative clinical dataset.Materials and methods: We collated and semi-automatically segmented an internal dataset of 164 computed tomography scans and classified them for task difficulty as easy, moderate, or hard. The performance of three open-source models-multi-organ objective segmentation (MOOSE), TotalSegmentator, and LungMask-was assessed using Dice similarity coefficient (DSC), robust Hausdorff distance (rHd95), and normalized surface distance (NSD). Additionally, we trained, validated, and tested an nnU-Net model using our local dataset and compared its performance with that of the other software on the test subset. All models were evaluated for generalizability using an external competition (LOLA11, n = 55).Results: TotalSegmentator outperformed MOOSE in DSC and NSD across all difficulty levels (p < 0.001), but not in rHd95 (p = 1.000). MOOSE and TotalSegmentator surpassed LungMask across metrics and difficulty classes (p < 0.001). Our model exceeded all other models on the internal dataset (n = 33) in all metrics, across all difficulty classes (p < 0.001), and on the external dataset. Missing lobes were correctly identified only by our model and LungMask in 3 and 1 of 7 cases, respectively.Conclusion: Open-source segmentation tools perform well in straightforward cases but struggle in unfamiliar, complex cases. Training on diverse, specialized datasets can improve generalizability, emphasizing representative data over sheer quantity.Relevance statement: Training lung lobe segmentation models on a local variety of cases improves accuracy, thus enhancing presurgical planning, ventilation-perfusion analysis, and disease localization, potentially impacting treatment decisions and patient outcomes in respiratory and thoracic care.Key points: Deep learning models trained on non-specialized datasets struggle with complex lung anomalies, yet their real-world limitations are insufficiently assessed. Training an identical model on a smaller yet clinically diverse and representative cohort improved performance in challenging cases. Data diversity outweighs the quantity in deep learning-based segmentation models. Accurate lung lobe segmentation may enhance presurgical assessment of lung lobar ventilation and perfusion function, optimizing clinical decision-making and patient outcomes.","PeriodicalId":36926,"journal":{"name":"European Radiology Experimental","volume":"9 1","pages":"86"},"PeriodicalIF":3.6000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12411369/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology Experimental","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41747-025-00623-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Lung lobe segmentation is required to assess lobar function with nuclear imaging before surgical interventions. We evaluated the performance of open-source deep learning-based lung lobe segmentation tools, compared to a similar nnU-Net model trained on a smaller but more representative clinical dataset.

Materials and methods: We collated and semi-automatically segmented an internal dataset of 164 computed tomography scans and classified them for task difficulty as easy, moderate, or hard. The performance of three open-source models-multi-organ objective segmentation (MOOSE), TotalSegmentator, and LungMask-was assessed using Dice similarity coefficient (DSC), robust Hausdorff distance (rHd95), and normalized surface distance (NSD). Additionally, we trained, validated, and tested an nnU-Net model using our local dataset and compared its performance with that of the other software on the test subset. All models were evaluated for generalizability using an external competition (LOLA11, n = 55).

Results: TotalSegmentator outperformed MOOSE in DSC and NSD across all difficulty levels (p < 0.001), but not in rHd95 (p = 1.000). MOOSE and TotalSegmentator surpassed LungMask across metrics and difficulty classes (p < 0.001). Our model exceeded all other models on the internal dataset (n = 33) in all metrics, across all difficulty classes (p < 0.001), and on the external dataset. Missing lobes were correctly identified only by our model and LungMask in 3 and 1 of 7 cases, respectively.

Conclusion: Open-source segmentation tools perform well in straightforward cases but struggle in unfamiliar, complex cases. Training on diverse, specialized datasets can improve generalizability, emphasizing representative data over sheer quantity.

Relevance statement: Training lung lobe segmentation models on a local variety of cases improves accuracy, thus enhancing presurgical planning, ventilation-perfusion analysis, and disease localization, potentially impacting treatment decisions and patient outcomes in respiratory and thoracic care.

Key points: Deep learning models trained on non-specialized datasets struggle with complex lung anomalies, yet their real-world limitations are insufficiently assessed. Training an identical model on a smaller yet clinically diverse and representative cohort improved performance in challenging cases. Data diversity outweighs the quantity in deep learning-based segmentation models. Accurate lung lobe segmentation may enhance presurgical assessment of lung lobar ventilation and perfusion function, optimizing clinical decision-making and patient outcomes.

查看原文本刊更多论文

肺叶分割：与本地内部模型相比，开源MOOSE、TotalSegmentator和LungMask模型的性能。

背景：术前核成像评估肺叶功能需要进行肺叶分割。我们评估了基于开源深度学习的肺叶分割工具的性能，并将其与在更小但更具代表性的临床数据集上训练的类似nnU-Net模型进行了比较。材料和方法：我们整理和半自动分割了164个计算机断层扫描的内部数据集，并将它们按任务难度分为简单、中等和困难。使用Dice相似系数（DSC）、鲁棒Hausdorff距离（rHd95）和归一化表面距离（NSD）对三种开源模型——多器官客观分割（MOOSE）、TotalSegmentator和lungmask的性能进行了评估。此外，我们使用本地数据集训练、验证和测试了一个nnU-Net模型，并将其性能与测试子集上的其他软件进行了比较。使用外部竞争评估所有模型的通用性（LOLA11, n = 55）。结果：TotalSegmentator在所有难度级别上都优于MOOSE在DSC和NSD中的表现(p结论：开源分割工具在简单的情况下表现良好，但在不熟悉的复杂情况下表现不佳。对不同的、专门的数据集进行训练可以提高泛化性，强调代表性数据而不是纯粹的数量。相关性声明：在不同的局部病例上训练肺叶分割模型可以提高准确性，从而增强手术前计划、通气-灌注分析和疾病定位，潜在地影响呼吸和胸部护理的治疗决策和患者预后。重点：在非专业数据集上训练的深度学习模型难以处理复杂的肺部异常，但其现实世界的局限性尚未得到充分评估。在一个较小但临床多样化且具有代表性的队列中训练相同的模型可以提高在具有挑战性的病例中的表现。在基于深度学习的分割模型中，数据多样性比数量更重要。准确的肺叶分割可以加强术前对肺大叶通气和灌注功能的评估，优化临床决策和患者预后。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊