Christian Navarro-Castellanos, R. Orozco-Morales, Yakdiel Rodriguez-Gallo
{"title":"Objective quality metrics for algorithm evaluation in Computed Tomography","authors":"Christian Navarro-Castellanos, R. Orozco-Morales, Yakdiel Rodriguez-Gallo","doi":"10.1109/CONCAPAN48024.2022.9997656","DOIUrl":null,"url":null,"abstract":"In recent years new algorithms for image reconstruction in computed tomography have been proposed. The aim is to optimize Image Quality (IQ) and reduce the radiation dose applied to patients. This increase makes necessary the evaluation and standardization of objective quality metrics that allow to obtain a measure of existing differences in Hounsfield Unit (HU). This paper compares the performance of fifteen full reference (FR)-IQ metrics with the criteria of specialists. Images obtained from a phantom are modified, emulating the defects caused by factors such as noise, blurring and spatial resolution reduction due to the decrease of the projections used in the reconstruction. To measure the correlation between FR-IQ and the score assigned by the radiologists, Spearman’s nonparametric rank order correlation coefficient and Kendall’s rank order correlation coefficient were used. Cohen’s kappa was used to assess interobserver agreement. The Most Apparent Distortion (MAD), Structural Similarity Index (SSIM), Information Content Weighted SSIM (IW-SSIM), Feature Similarity Index Measure (FSIM), Information Content Weighted - Mean Square Error (IW-MSE), Information Content Weighted PSNR (IW-PSNR), Optimal Scale Selection Structural Similarity Index (OSS-SSIM), Riesz transform and Visual contrast sensitivity-based feature SIMilarity index (RVSIM), and Spectral Residual Based Similarity (SR-SIM) metrics were the best performing $(p\\lt 0.001)$. The worst results were obtained by Noise Quality Measure (NQM) $(p=0.051)$ and Weighted Signal to Noise Ratio (WSNR) $(p=0.829)$. Results obtained show that a study using these objective quality metrics could dispense with expert judgment to evaluate the performance of the developed methods.","PeriodicalId":138415,"journal":{"name":"2022 IEEE 40th Central America and Panama Convention (CONCAPAN)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 40th Central America and Panama Convention (CONCAPAN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONCAPAN48024.2022.9997656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years new algorithms for image reconstruction in computed tomography have been proposed. The aim is to optimize Image Quality (IQ) and reduce the radiation dose applied to patients. This increase makes necessary the evaluation and standardization of objective quality metrics that allow to obtain a measure of existing differences in Hounsfield Unit (HU). This paper compares the performance of fifteen full reference (FR)-IQ metrics with the criteria of specialists. Images obtained from a phantom are modified, emulating the defects caused by factors such as noise, blurring and spatial resolution reduction due to the decrease of the projections used in the reconstruction. To measure the correlation between FR-IQ and the score assigned by the radiologists, Spearman’s nonparametric rank order correlation coefficient and Kendall’s rank order correlation coefficient were used. Cohen’s kappa was used to assess interobserver agreement. The Most Apparent Distortion (MAD), Structural Similarity Index (SSIM), Information Content Weighted SSIM (IW-SSIM), Feature Similarity Index Measure (FSIM), Information Content Weighted - Mean Square Error (IW-MSE), Information Content Weighted PSNR (IW-PSNR), Optimal Scale Selection Structural Similarity Index (OSS-SSIM), Riesz transform and Visual contrast sensitivity-based feature SIMilarity index (RVSIM), and Spectral Residual Based Similarity (SR-SIM) metrics were the best performing $(p\lt 0.001)$. The worst results were obtained by Noise Quality Measure (NQM) $(p=0.051)$ and Weighted Signal to Noise Ratio (WSNR) $(p=0.829)$. Results obtained show that a study using these objective quality metrics could dispense with expert judgment to evaluate the performance of the developed methods.