Yun Ming Wong, Wen Siang Lew, James Cheow Lei Lee, Hong Qi Tan
{"title":"Performance evaluation of deformable image registration algorithms: target registration error and its correlation to Dice similarity coefficient.","authors":"Yun Ming Wong, Wen Siang Lew, James Cheow Lei Lee, Hong Qi Tan","doi":"10.1016/j.zemedi.2025.09.001","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>The wide usability of deformable image registration (DIR) deems the process of quality assurance important for a reliable clinical translation. Our work mainly aimed to compare the performances of four DIR software, in terms of voxel mapping accuracy quantified through target registration error (TRE), and its organ-wise correlation with Dice similarity coefficient (DSC), a widely used segmentation metric.</p><p><strong>Methods: </strong>CT scans were taken for one static scenario and four deformation scenarios simulated using an in-house deformable anthropomorphic pelvis phantom. Their CT numbers were overridden based on actual patient scan, and these overridden scans were used as input images in this study. Four DIR software were tested: RayStation v10B, Velocity v4.1, Slicer, and Plastimatch. Multiple DIRs were performed for each software, using different algorithm options or parameters. The TRE was quantified by calculating the difference between the true and mapped marker positions. Subsequently, Pearson correlation tests were done to examine the correlation between DSC and mean TRE, separately for bladder, prostate, rectum and all organs combined. Similar analyses were conducted for prostate alone, to gain more insights regarding a homogeneous medium. Additionally, DSC was used to predict whether the mean TRE exceeded 3 mm. The classification performance was assessed using accuracy, precision, recall, F1-score, specificity and area under the Receiver Operating Characteristic curve (AUC).</p><p><strong>Results: </strong>Among the four software tested, RayStation achieved the lowest mean TRE for all deformation scenarios, with values between 1.48 mm and 3.06 mm. Pearson correlation tests revealed an exceptionally strong negative correlation between DSC and mean TRE for SlicerElastix, where the correlation coefficients ranged from -0.901 to -0.987. In line with the strongest correlation found, SlicerElastix achieved the highest classification performance scores overall. For all three organs, the scores at their corresponding best DSC threshold were mostly higher than 0.80, and the AUCs were close to 1.</p><p><strong>Conclusion: </strong>In short, this work quantified and compared four DIR software based on the voxel mapping accuracy as well as its correlation with DSC, in the major organs in prostate radiotherapy.</p>","PeriodicalId":101315,"journal":{"name":"Zeitschrift fur medizinische Physik","volume":" ","pages":""},"PeriodicalIF":4.2000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Zeitschrift fur medizinische Physik","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.zemedi.2025.09.001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: The wide usability of deformable image registration (DIR) deems the process of quality assurance important for a reliable clinical translation. Our work mainly aimed to compare the performances of four DIR software, in terms of voxel mapping accuracy quantified through target registration error (TRE), and its organ-wise correlation with Dice similarity coefficient (DSC), a widely used segmentation metric.
Methods: CT scans were taken for one static scenario and four deformation scenarios simulated using an in-house deformable anthropomorphic pelvis phantom. Their CT numbers were overridden based on actual patient scan, and these overridden scans were used as input images in this study. Four DIR software were tested: RayStation v10B, Velocity v4.1, Slicer, and Plastimatch. Multiple DIRs were performed for each software, using different algorithm options or parameters. The TRE was quantified by calculating the difference between the true and mapped marker positions. Subsequently, Pearson correlation tests were done to examine the correlation between DSC and mean TRE, separately for bladder, prostate, rectum and all organs combined. Similar analyses were conducted for prostate alone, to gain more insights regarding a homogeneous medium. Additionally, DSC was used to predict whether the mean TRE exceeded 3 mm. The classification performance was assessed using accuracy, precision, recall, F1-score, specificity and area under the Receiver Operating Characteristic curve (AUC).
Results: Among the four software tested, RayStation achieved the lowest mean TRE for all deformation scenarios, with values between 1.48 mm and 3.06 mm. Pearson correlation tests revealed an exceptionally strong negative correlation between DSC and mean TRE for SlicerElastix, where the correlation coefficients ranged from -0.901 to -0.987. In line with the strongest correlation found, SlicerElastix achieved the highest classification performance scores overall. For all three organs, the scores at their corresponding best DSC threshold were mostly higher than 0.80, and the AUCs were close to 1.
Conclusion: In short, this work quantified and compared four DIR software based on the voxel mapping accuracy as well as its correlation with DSC, in the major organs in prostate radiotherapy.