{"title":"Qualitative evaluation of automatic liver segmentation in computed tomography images for clinical use in radiation therapy","authors":"Dorea Maria Khalal , Souleyman Slimani , Zine Eddine Bouraoui , Hacene Azizi","doi":"10.1016/j.canrad.2025.104648","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>Segmentation of target volumes and organs at risk on computed tomography (CT) images constitutes an important step in the radiotherapy workflow. Artificial intelligence-based methods have significantly improved organ segmentation in medical images. Automatic segmentations are frequently evaluated using geometric metrics. Before a clinical implementation in the radiotherapy workflow, automatic segmentations must also be evaluated by clinicians. The aim of this study was to investigate the correlation between geometric metrics used for segmentation evaluation and the assessment performed by clinicians.</div></div><div><h3>Materials and Methods</h3><div>In this study, we used the U-Net model to segment the liver in CT images from a publicly available dataset. The model's performance was evaluated using two geometric metrics: the Dice similarity coefficient and the Hausdorff distance. Additionally, a qualitative evaluation was performed by clinicians who reviewed the automatic segmentations to rate their clinical acceptability for use in the radiotherapy workflow. The correlation between the geometric metrics and the clinicians’ evaluations was studied.</div></div><div><h3>Results</h3><div>The results showed that while the Dice coefficient and Hausdorff distance are reliable indicators of segmentation accuracy, they do not always align with clinician segmentation. In some cases, segmentations with high Dice scores still required clinician corrections before clinical use in the radiotherapy workflow.</div></div><div><h3>Conclusion</h3><div>This study highlights the need for more comprehensive evaluation metrics beyond geometric measures to assess the clinical acceptability of artificial intelligence-based segmentation. Although the deep learning model provided promising segmentation results, the present study shows that standardized validation methodologies are crucial for ensuring the clinical viability of automatic segmentation systems.</div></div>","PeriodicalId":9504,"journal":{"name":"Cancer Radiotherapie","volume":"29 4","pages":"Article 104648"},"PeriodicalIF":1.5000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Radiotherapie","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1278321825000642","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
Segmentation of target volumes and organs at risk on computed tomography (CT) images constitutes an important step in the radiotherapy workflow. Artificial intelligence-based methods have significantly improved organ segmentation in medical images. Automatic segmentations are frequently evaluated using geometric metrics. Before a clinical implementation in the radiotherapy workflow, automatic segmentations must also be evaluated by clinicians. The aim of this study was to investigate the correlation between geometric metrics used for segmentation evaluation and the assessment performed by clinicians.
Materials and Methods
In this study, we used the U-Net model to segment the liver in CT images from a publicly available dataset. The model's performance was evaluated using two geometric metrics: the Dice similarity coefficient and the Hausdorff distance. Additionally, a qualitative evaluation was performed by clinicians who reviewed the automatic segmentations to rate their clinical acceptability for use in the radiotherapy workflow. The correlation between the geometric metrics and the clinicians’ evaluations was studied.
Results
The results showed that while the Dice coefficient and Hausdorff distance are reliable indicators of segmentation accuracy, they do not always align with clinician segmentation. In some cases, segmentations with high Dice scores still required clinician corrections before clinical use in the radiotherapy workflow.
Conclusion
This study highlights the need for more comprehensive evaluation metrics beyond geometric measures to assess the clinical acceptability of artificial intelligence-based segmentation. Although the deep learning model provided promising segmentation results, the present study shows that standardized validation methodologies are crucial for ensuring the clinical viability of automatic segmentation systems.
期刊介绍:
Cancer/radiothérapie se veut d''abord et avant tout un organe francophone de publication des travaux de recherche en radiothérapie. La revue a pour objectif de diffuser les informations majeures sur les travaux de recherche en cancérologie et tout ce qui touche de près ou de loin au traitement du cancer par les radiations : technologie, radiophysique, radiobiologie et radiothérapie clinique.