Mehdi Amini, Yazdan Salimi, Ghasem Hajianfar, Ismini Mainta, Elsa Hervier, Amirhossein Sanaat, Arman Rahmim, Isaac Shiri, Habib Zaidi
{"title":"全自动特定区域人类感知等效图像质量评估:应用于 18F-FDG PET 扫描。","authors":"Mehdi Amini, Yazdan Salimi, Ghasem Hajianfar, Ismini Mainta, Elsa Hervier, Amirhossein Sanaat, Arman Rahmim, Isaac Shiri, Habib Zaidi","doi":"10.1097/RLU.0000000000005526","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>We propose a fully automated framework to conduct a region-wise image quality assessment (IQA) on whole-body 18 F-FDG PET scans. This framework (1) can be valuable in daily clinical image acquisition procedures to instantly recognize low-quality scans for potential rescanning and/or image reconstruction, and (2) can make a significant impact in dataset collection for the development of artificial intelligence-driven 18 F-FDG PET analysis models by rejecting low-quality images and those presenting with artifacts, toward building clean datasets.</p><p><strong>Patients and methods: </strong>Two experienced nuclear medicine physicians separately evaluated the quality of 174 18 F-FDG PET images from 87 patients, for each body region, based on a 5-point Likert scale. The body regisons included the following: (1) the head and neck, including the brain, (2) the chest, (3) the chest-abdomen interval (diaphragmatic region), (4) the abdomen, and (5) the pelvis. Intrareader and interreader reproducibility of the quality scores were calculated using 39 randomly selected scans from the dataset. Utilizing a binarized classification, images were dichotomized into low-quality versus high-quality for physician quality scores ≤3 versus >3, respectively. Inputting the 18 F-FDG PET/CT scans, our proposed fully automated framework applies 2 deep learning (DL) models on CT images to perform region identification and whole-body contour extraction (excluding extremities), then classifies PET regions as low and high quality. For classification, 2 mainstream artificial intelligence-driven approaches, including machine learning (ML) from radiomic features and DL, were investigated. All models were trained and evaluated on scores attributed by each physician, and the average of the scores reported. DL and radiomics-ML models were evaluated on the same test dataset. The performance evaluation was carried out on the same test dataset for radiomics-ML and DL models using the area under the curve, accuracy, sensitivity, and specificity and compared using the Delong test with P values <0.05 regarded as statistically significant.</p><p><strong>Results: </strong>In the head and neck, chest, chest-abdomen interval, abdomen, and pelvis regions, the best models achieved area under the curve, accuracy, sensitivity, and specificity of [0.97, 0.95, 0.96, and 0.95], [0.85, 0.82, 0.87, and 0.76], [0.83, 0.76, 0.68, and 0.80], [0.73, 0.72, 0.64, and 0.77], and [0.72, 0.68, 0.70, and 0.67], respectively. In all regions, models revealed highest performance, when developed on the quality scores with higher intrareader reproducibility. Comparison of DL and radiomics-ML models did not show any statistically significant differences, though DL models showed overall improved trends.</p><p><strong>Conclusions: </strong>We developed a fully automated and human-perceptive equivalent model to conduct region-wise IQA over 18 F-FDG PET images. Our analysis emphasizes the necessity of developing separate models for body regions and performing data annotation based on multiple experts' consensus in IQA studies.</p>","PeriodicalId":10692,"journal":{"name":"Clinical Nuclear Medicine","volume":null,"pages":null},"PeriodicalIF":9.6000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Fully Automated Region-Specific Human-Perceptive-Equivalent Image Quality Assessment: Application to 18 F-FDG PET Scans.\",\"authors\":\"Mehdi Amini, Yazdan Salimi, Ghasem Hajianfar, Ismini Mainta, Elsa Hervier, Amirhossein Sanaat, Arman Rahmim, Isaac Shiri, Habib Zaidi\",\"doi\":\"10.1097/RLU.0000000000005526\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>We propose a fully automated framework to conduct a region-wise image quality assessment (IQA) on whole-body 18 F-FDG PET scans. This framework (1) can be valuable in daily clinical image acquisition procedures to instantly recognize low-quality scans for potential rescanning and/or image reconstruction, and (2) can make a significant impact in dataset collection for the development of artificial intelligence-driven 18 F-FDG PET analysis models by rejecting low-quality images and those presenting with artifacts, toward building clean datasets.</p><p><strong>Patients and methods: </strong>Two experienced nuclear medicine physicians separately evaluated the quality of 174 18 F-FDG PET images from 87 patients, for each body region, based on a 5-point Likert scale. The body regisons included the following: (1) the head and neck, including the brain, (2) the chest, (3) the chest-abdomen interval (diaphragmatic region), (4) the abdomen, and (5) the pelvis. Intrareader and interreader reproducibility of the quality scores were calculated using 39 randomly selected scans from the dataset. Utilizing a binarized classification, images were dichotomized into low-quality versus high-quality for physician quality scores ≤3 versus >3, respectively. Inputting the 18 F-FDG PET/CT scans, our proposed fully automated framework applies 2 deep learning (DL) models on CT images to perform region identification and whole-body contour extraction (excluding extremities), then classifies PET regions as low and high quality. For classification, 2 mainstream artificial intelligence-driven approaches, including machine learning (ML) from radiomic features and DL, were investigated. All models were trained and evaluated on scores attributed by each physician, and the average of the scores reported. DL and radiomics-ML models were evaluated on the same test dataset. The performance evaluation was carried out on the same test dataset for radiomics-ML and DL models using the area under the curve, accuracy, sensitivity, and specificity and compared using the Delong test with P values <0.05 regarded as statistically significant.</p><p><strong>Results: </strong>In the head and neck, chest, chest-abdomen interval, abdomen, and pelvis regions, the best models achieved area under the curve, accuracy, sensitivity, and specificity of [0.97, 0.95, 0.96, and 0.95], [0.85, 0.82, 0.87, and 0.76], [0.83, 0.76, 0.68, and 0.80], [0.73, 0.72, 0.64, and 0.77], and [0.72, 0.68, 0.70, and 0.67], respectively. In all regions, models revealed highest performance, when developed on the quality scores with higher intrareader reproducibility. Comparison of DL and radiomics-ML models did not show any statistically significant differences, though DL models showed overall improved trends.</p><p><strong>Conclusions: </strong>We developed a fully automated and human-perceptive equivalent model to conduct region-wise IQA over 18 F-FDG PET images. Our analysis emphasizes the necessity of developing separate models for body regions and performing data annotation based on multiple experts' consensus in IQA studies.</p>\",\"PeriodicalId\":10692,\"journal\":{\"name\":\"Clinical Nuclear Medicine\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":9.6000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Clinical Nuclear Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/RLU.0000000000005526\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/10/21 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Nuclear Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/RLU.0000000000005526","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/21 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
摘要
简介:我们提出了一种对全身 18F-FDG PET 扫描进行区域图像质量评估(IQA)的全自动框架。该框架(1)可在日常临床图像采集程序中发挥重要作用,即时识别低质量扫描,以便进行潜在的重新扫描和/或图像重建;(2)通过剔除低质量图像和出现伪影的图像,建立干净的数据集,从而在数据集收集方面对人工智能驱动的 18F-FDG PET 分析模型的开发产生重大影响:两名经验丰富的核医学医生分别对 87 名患者的 174 幅 18F-FDG PET 图像进行了质量评估,每个身体区域的评估均采用 5 点李克特量表。身体区域包括(1) 头颈部,包括大脑;(2) 胸部;(3) 胸腹间隙(膈区);(4) 腹部;(5) 骨盆。使用从数据集中随机抽取的 39 个扫描结果计算了质量评分的读片机内和读片机间重现性。通过二值化分类,医生质量评分≤3分和>3分的图像分别被分为低质量和高质量。输入 18F-FDG PET/CT 扫描图像后,我们提出的全自动框架将 2 个深度学习(DL)模型应用于 CT 图像,执行区域识别和全身轮廓提取(不包括四肢),然后将 PET 区域划分为低质量和高质量。在分类方面,研究了 2 种主流的人工智能驱动方法,包括来自放射学特征的机器学习(ML)和深度学习。所有模型均根据每位医生的评分进行训练和评估,并报告评分的平均值。在同一个测试数据集上对 DL 和放射组学-ML 模型进行了评估。使用曲线下面积、准确性、灵敏度和特异性对放射组学-ML 模型和 DL 模型在同一测试数据集上进行了性能评估,并使用德隆检验比较了 P 值 结果:在头颈部、胸部、胸腹间隙、腹部和骨盆区域,最佳模型的曲线下面积、准确性、灵敏度和特异性分别为 [0.97、0.95、0.96 和 0.95]、[0.85、0.82、0.87 和 0.76]、[0.83、0.76、0.68 和 0.80]、[0.73、0.72、0.64 和 0.77]以及[0.72、0.68、0.70 和 0.67]。在所有区域,根据读片机内部重现性较高的质量分数开发的模型性能最高。DL模型和放射组学-ML模型的比较没有显示出任何统计学上的显著差异,但DL模型显示出整体改善的趋势:我们开发了一种全自动、人类可感知的等效模型,用于对 18F-FDG PET 图像进行区域 IQA。我们的分析强调了在 IQA 研究中为身体区域开发单独模型并根据多位专家的共识进行数据注释的必要性。
Fully Automated Region-Specific Human-Perceptive-Equivalent Image Quality Assessment: Application to 18 F-FDG PET Scans.
Introduction: We propose a fully automated framework to conduct a region-wise image quality assessment (IQA) on whole-body 18 F-FDG PET scans. This framework (1) can be valuable in daily clinical image acquisition procedures to instantly recognize low-quality scans for potential rescanning and/or image reconstruction, and (2) can make a significant impact in dataset collection for the development of artificial intelligence-driven 18 F-FDG PET analysis models by rejecting low-quality images and those presenting with artifacts, toward building clean datasets.
Patients and methods: Two experienced nuclear medicine physicians separately evaluated the quality of 174 18 F-FDG PET images from 87 patients, for each body region, based on a 5-point Likert scale. The body regisons included the following: (1) the head and neck, including the brain, (2) the chest, (3) the chest-abdomen interval (diaphragmatic region), (4) the abdomen, and (5) the pelvis. Intrareader and interreader reproducibility of the quality scores were calculated using 39 randomly selected scans from the dataset. Utilizing a binarized classification, images were dichotomized into low-quality versus high-quality for physician quality scores ≤3 versus >3, respectively. Inputting the 18 F-FDG PET/CT scans, our proposed fully automated framework applies 2 deep learning (DL) models on CT images to perform region identification and whole-body contour extraction (excluding extremities), then classifies PET regions as low and high quality. For classification, 2 mainstream artificial intelligence-driven approaches, including machine learning (ML) from radiomic features and DL, were investigated. All models were trained and evaluated on scores attributed by each physician, and the average of the scores reported. DL and radiomics-ML models were evaluated on the same test dataset. The performance evaluation was carried out on the same test dataset for radiomics-ML and DL models using the area under the curve, accuracy, sensitivity, and specificity and compared using the Delong test with P values <0.05 regarded as statistically significant.
Results: In the head and neck, chest, chest-abdomen interval, abdomen, and pelvis regions, the best models achieved area under the curve, accuracy, sensitivity, and specificity of [0.97, 0.95, 0.96, and 0.95], [0.85, 0.82, 0.87, and 0.76], [0.83, 0.76, 0.68, and 0.80], [0.73, 0.72, 0.64, and 0.77], and [0.72, 0.68, 0.70, and 0.67], respectively. In all regions, models revealed highest performance, when developed on the quality scores with higher intrareader reproducibility. Comparison of DL and radiomics-ML models did not show any statistically significant differences, though DL models showed overall improved trends.
Conclusions: We developed a fully automated and human-perceptive equivalent model to conduct region-wise IQA over 18 F-FDG PET images. Our analysis emphasizes the necessity of developing separate models for body regions and performing data annotation based on multiple experts' consensus in IQA studies.
期刊介绍:
Clinical Nuclear Medicine is a comprehensive and current resource for professionals in the field of nuclear medicine. It caters to both generalists and specialists, offering valuable insights on how to effectively apply nuclear medicine techniques in various clinical scenarios. With a focus on timely dissemination of information, this journal covers the latest developments that impact all aspects of the specialty.
Geared towards practitioners, Clinical Nuclear Medicine is the ultimate practice-oriented publication in the field of nuclear imaging. Its informative articles are complemented by numerous illustrations that demonstrate how physicians can seamlessly integrate the knowledge gained into their everyday practice.