Artificial intelligence-based analysis of whole-body bone scintigraphy: The quest for the optimal deep learning algorithm and comparison with human observer performance

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS
Ghasem Hajianfar , Maziar Sabouri , Yazdan Salimi , Mehdi Amini , Soroush Bagheri , Elnaz Jenabi , Sepideh Hekmat , Mehdi Maghsudi , Zahra Mansouri , Maziar Khateri , Mohammad Hosein Jamshidi , Esmail Jafari , Ahmad Bitarafan Rajabi , Majid Assadi , Mehrdad Oveisi , Isaac Shiri , Habib Zaidi
{"title":"Artificial intelligence-based analysis of whole-body bone scintigraphy: The quest for the optimal deep learning algorithm and comparison with human observer performance","authors":"Ghasem Hajianfar ,&nbsp;Maziar Sabouri ,&nbsp;Yazdan Salimi ,&nbsp;Mehdi Amini ,&nbsp;Soroush Bagheri ,&nbsp;Elnaz Jenabi ,&nbsp;Sepideh Hekmat ,&nbsp;Mehdi Maghsudi ,&nbsp;Zahra Mansouri ,&nbsp;Maziar Khateri ,&nbsp;Mohammad Hosein Jamshidi ,&nbsp;Esmail Jafari ,&nbsp;Ahmad Bitarafan Rajabi ,&nbsp;Majid Assadi ,&nbsp;Mehrdad Oveisi ,&nbsp;Isaac Shiri ,&nbsp;Habib Zaidi","doi":"10.1016/j.zemedi.2023.01.008","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>Whole-body bone scintigraphy (WBS) is one of the most widely used modalities in diagnosing malignant bone diseases during the early stages. However, the procedure is time-consuming and requires vigour and experience. Moreover, interpretation of WBS scans in the early stages of the disorders might be challenging because the patterns often reflect normal appearance that is prone to subjective interpretation. To simplify the gruelling, subjective, and prone-to-error task of interpreting WBS scans, we developed deep learning (DL) models to automate two major analyses, namely (i) classification of scans into normal and abnormal and (ii) discrimination between malignant and non-neoplastic bone diseases, and compared their performance with human observers.</p></div><div><h3>Materials and Methods</h3><p>After applying our exclusion criteria on 7188 patients from three different centers, 3772 and 2248 patients were enrolled for the first and second analyses, respectively. Data were split into two parts, including training and testing, while a fraction of training data were considered for validation. Ten different CNN models were applied to single- and dual-view input (posterior and anterior views) modes to find the optimal model for each analysis. In addition, three different methods, including squeeze-and-excitation (SE), spatial pyramid pooling (SPP), and attention-augmented (AA), were used to aggregate the features for dual-view input models. Model performance was reported through area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, and specificity and was compared with the DeLong test applied to ROC curves. The test dataset was evaluated by three nuclear medicine physicians (NMPs) with different levels of experience to compare the performance of AI and human observers.</p></div><div><h3>Results</h3><p>DenseNet121_AA (DensNet121, with dual-view input aggregated by AA) and InceptionResNetV2_SPP achieved the highest performance (AUC = 0.72) for the first and second analyses, respectively. Moreover, on average, in the first analysis, Inception V3 and InceptionResNetV2 CNN models and dual-view input with AA aggregating method had superior performance. In addition, in the second analysis, DenseNet121 and InceptionResNetV2 as CNN methods and dual-view input with AA aggregating method achieved the best results. Conversely, the performance of AI models was significantly higher than human observers for the first analysis, whereas their performance was comparable in the second analysis, although the AI model assessed the scans in a drastically lower time.</p></div><div><h3>Conclusion</h3><p>Using the models designed in this study, a positive step can be taken toward improving and optimizing WBS interpretation. By training DL models with larger and more diverse cohorts, AI could potentially be used to assist physicians in the assessment of WBS images.</p></div>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0939388923000089/pdfft?md5=40da3cacf80f682e80f4655f04f990de&pid=1-s2.0-S0939388923000089-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0939388923000089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

Whole-body bone scintigraphy (WBS) is one of the most widely used modalities in diagnosing malignant bone diseases during the early stages. However, the procedure is time-consuming and requires vigour and experience. Moreover, interpretation of WBS scans in the early stages of the disorders might be challenging because the patterns often reflect normal appearance that is prone to subjective interpretation. To simplify the gruelling, subjective, and prone-to-error task of interpreting WBS scans, we developed deep learning (DL) models to automate two major analyses, namely (i) classification of scans into normal and abnormal and (ii) discrimination between malignant and non-neoplastic bone diseases, and compared their performance with human observers.

Materials and Methods

After applying our exclusion criteria on 7188 patients from three different centers, 3772 and 2248 patients were enrolled for the first and second analyses, respectively. Data were split into two parts, including training and testing, while a fraction of training data were considered for validation. Ten different CNN models were applied to single- and dual-view input (posterior and anterior views) modes to find the optimal model for each analysis. In addition, three different methods, including squeeze-and-excitation (SE), spatial pyramid pooling (SPP), and attention-augmented (AA), were used to aggregate the features for dual-view input models. Model performance was reported through area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, and specificity and was compared with the DeLong test applied to ROC curves. The test dataset was evaluated by three nuclear medicine physicians (NMPs) with different levels of experience to compare the performance of AI and human observers.

Results

DenseNet121_AA (DensNet121, with dual-view input aggregated by AA) and InceptionResNetV2_SPP achieved the highest performance (AUC = 0.72) for the first and second analyses, respectively. Moreover, on average, in the first analysis, Inception V3 and InceptionResNetV2 CNN models and dual-view input with AA aggregating method had superior performance. In addition, in the second analysis, DenseNet121 and InceptionResNetV2 as CNN methods and dual-view input with AA aggregating method achieved the best results. Conversely, the performance of AI models was significantly higher than human observers for the first analysis, whereas their performance was comparable in the second analysis, although the AI model assessed the scans in a drastically lower time.

Conclusion

Using the models designed in this study, a positive step can be taken toward improving and optimizing WBS interpretation. By training DL models with larger and more diverse cohorts, AI could potentially be used to assist physicians in the assessment of WBS images.

基于人工智能的全身骨扫描分析:探索最佳深度学习算法并与人类观察者的表现进行比较
目的全身骨闪烁扫描(WBS)是早期诊断恶性骨病最广泛使用的方法之一。然而,这种方法耗时较长,需要足够的精力和经验。此外,在疾病的早期阶段对 WBS 扫描的解读可能具有挑战性,因为其模式通常反映的是正常的外观,而这种外观很容易受到主观解读的影响。为了简化解读 WBS 扫描这一艰巨、主观且容易出错的任务,我们开发了深度学习(DL)模型来自动进行两项主要分析,即(i)将扫描分为正常和异常;(ii)区分恶性和非肿瘤性骨病,并将其性能与人类观察者进行了比较。数据分为两部分,包括训练数据和测试数据,其中一部分训练数据用于验证。在单视图和双视图(后视图和前视图)输入模式中应用了十种不同的 CNN 模型,以便为每次分析找到最佳模型。此外,还使用了三种不同的方法,包括挤压激励法(SE)、空间金字塔池化法(SPP)和注意力增强法(AA),来聚合双视角输入模型的特征。模型性能通过接收器工作特征曲线(ROC)下面积(AUC)、准确性、灵敏度和特异性进行报告,并与应用于 ROC 曲线的 DeLong 检验进行比较。测试数据集由三位具有不同经验水平的核医学医生(NMP)进行评估,以比较人工智能和人类观察者的性能。结果DenseNet121_AA(DensNet121,双视图输入由 AA 聚合)和 InceptionResNetV2_SPP 在第一次和第二次分析中分别取得了最高的性能(AUC = 0.72)。此外,平均而言,在第一次分析中,Inception V3 和 InceptionResNetV2 CNN 模型以及采用 AA 聚合方法的双视角输入具有更优越的性能。此外,在第二次分析中,作为 CNN 方法的 DenseNet121 和 InceptionResNetV2 以及采用 AA 聚合法的双视图输入取得了最佳结果。相反,在第一次分析中,人工智能模型的性能明显高于人类观察者,而在第二次分析中,虽然人工智能模型评估扫描的时间大大缩短,但两者的性能相当。通过使用更大、更多样化的队列来训练 DL 模型,人工智能有可能用于协助医生评估 WBS 图像。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信