Ghasem Hajianfar , Maziar Sabouri , Yazdan Salimi , Mehdi Amini , Soroush Bagheri , Elnaz Jenabi , Sepideh Hekmat , Mehdi Maghsudi , Zahra Mansouri , Maziar Khateri , Mohammad Hosein Jamshidi , Esmail Jafari , Ahmad Bitarafan Rajabi , Majid Assadi , Mehrdad Oveisi , Isaac Shiri , Habib Zaidi
{"title":"基于人工智能的全身骨扫描分析:探索最佳深度学习算法并与人类观察者的表现进行比较","authors":"Ghasem Hajianfar , Maziar Sabouri , Yazdan Salimi , Mehdi Amini , Soroush Bagheri , Elnaz Jenabi , Sepideh Hekmat , Mehdi Maghsudi , Zahra Mansouri , Maziar Khateri , Mohammad Hosein Jamshidi , Esmail Jafari , Ahmad Bitarafan Rajabi , Majid Assadi , Mehrdad Oveisi , Isaac Shiri , Habib Zaidi","doi":"10.1016/j.zemedi.2023.01.008","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>Whole-body bone scintigraphy (WBS) is one of the most widely used modalities in diagnosing malignant bone diseases during the early stages. However, the procedure is time-consuming and requires vigour and experience. Moreover, interpretation of WBS scans in the early stages of the disorders might be challenging because the patterns often reflect normal appearance that is prone to subjective interpretation. To simplify the gruelling, subjective, and prone-to-error task of interpreting WBS scans, we developed deep learning (DL) models to automate two major analyses, namely (i) classification of scans into normal and abnormal and (ii) discrimination between malignant and non-neoplastic bone diseases, and compared their performance with human observers.</p></div><div><h3>Materials and Methods</h3><p>After applying our exclusion criteria on 7188 patients from three different centers, 3772 and 2248 patients were enrolled for the first and second analyses, respectively. Data were split into two parts, including training and testing, while a fraction of training data were considered for validation. Ten different CNN models were applied to single- and dual-view input (posterior and anterior views) modes to find the optimal model for each analysis. In addition, three different methods, including squeeze-and-excitation (SE), spatial pyramid pooling (SPP), and attention-augmented (AA), were used to aggregate the features for dual-view input models. Model performance was reported through area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, and specificity and was compared with the DeLong test applied to ROC curves. The test dataset was evaluated by three nuclear medicine physicians (NMPs) with different levels of experience to compare the performance of AI and human observers.</p></div><div><h3>Results</h3><p>DenseNet121_AA (DensNet121, with dual-view input aggregated by AA) and InceptionResNetV2_SPP achieved the highest performance (AUC = 0.72) for the first and second analyses, respectively. Moreover, on average, in the first analysis, Inception V3 and InceptionResNetV2 CNN models and dual-view input with AA aggregating method had superior performance. In addition, in the second analysis, DenseNet121 and InceptionResNetV2 as CNN methods and dual-view input with AA aggregating method achieved the best results. Conversely, the performance of AI models was significantly higher than human observers for the first analysis, whereas their performance was comparable in the second analysis, although the AI model assessed the scans in a drastically lower time.</p></div><div><h3>Conclusion</h3><p>Using the models designed in this study, a positive step can be taken toward improving and optimizing WBS interpretation. By training DL models with larger and more diverse cohorts, AI could potentially be used to assist physicians in the assessment of WBS images.</p></div>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0939388923000089/pdfft?md5=40da3cacf80f682e80f4655f04f990de&pid=1-s2.0-S0939388923000089-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Artificial intelligence-based analysis of whole-body bone scintigraphy: The quest for the optimal deep learning algorithm and comparison with human observer performance\",\"authors\":\"Ghasem Hajianfar , Maziar Sabouri , Yazdan Salimi , Mehdi Amini , Soroush Bagheri , Elnaz Jenabi , Sepideh Hekmat , Mehdi Maghsudi , Zahra Mansouri , Maziar Khateri , Mohammad Hosein Jamshidi , Esmail Jafari , Ahmad Bitarafan Rajabi , Majid Assadi , Mehrdad Oveisi , Isaac Shiri , Habib Zaidi\",\"doi\":\"10.1016/j.zemedi.2023.01.008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Purpose</h3><p>Whole-body bone scintigraphy (WBS) is one of the most widely used modalities in diagnosing malignant bone diseases during the early stages. However, the procedure is time-consuming and requires vigour and experience. Moreover, interpretation of WBS scans in the early stages of the disorders might be challenging because the patterns often reflect normal appearance that is prone to subjective interpretation. To simplify the gruelling, subjective, and prone-to-error task of interpreting WBS scans, we developed deep learning (DL) models to automate two major analyses, namely (i) classification of scans into normal and abnormal and (ii) discrimination between malignant and non-neoplastic bone diseases, and compared their performance with human observers.</p></div><div><h3>Materials and Methods</h3><p>After applying our exclusion criteria on 7188 patients from three different centers, 3772 and 2248 patients were enrolled for the first and second analyses, respectively. Data were split into two parts, including training and testing, while a fraction of training data were considered for validation. Ten different CNN models were applied to single- and dual-view input (posterior and anterior views) modes to find the optimal model for each analysis. In addition, three different methods, including squeeze-and-excitation (SE), spatial pyramid pooling (SPP), and attention-augmented (AA), were used to aggregate the features for dual-view input models. Model performance was reported through area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, and specificity and was compared with the DeLong test applied to ROC curves. The test dataset was evaluated by three nuclear medicine physicians (NMPs) with different levels of experience to compare the performance of AI and human observers.</p></div><div><h3>Results</h3><p>DenseNet121_AA (DensNet121, with dual-view input aggregated by AA) and InceptionResNetV2_SPP achieved the highest performance (AUC = 0.72) for the first and second analyses, respectively. Moreover, on average, in the first analysis, Inception V3 and InceptionResNetV2 CNN models and dual-view input with AA aggregating method had superior performance. In addition, in the second analysis, DenseNet121 and InceptionResNetV2 as CNN methods and dual-view input with AA aggregating method achieved the best results. Conversely, the performance of AI models was significantly higher than human observers for the first analysis, whereas their performance was comparable in the second analysis, although the AI model assessed the scans in a drastically lower time.</p></div><div><h3>Conclusion</h3><p>Using the models designed in this study, a positive step can be taken toward improving and optimizing WBS interpretation. By training DL models with larger and more diverse cohorts, AI could potentially be used to assist physicians in the assessment of WBS images.</p></div>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0939388923000089/pdfft?md5=40da3cacf80f682e80f4655f04f990de&pid=1-s2.0-S0939388923000089-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0939388923000089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0939388923000089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
Artificial intelligence-based analysis of whole-body bone scintigraphy: The quest for the optimal deep learning algorithm and comparison with human observer performance
Purpose
Whole-body bone scintigraphy (WBS) is one of the most widely used modalities in diagnosing malignant bone diseases during the early stages. However, the procedure is time-consuming and requires vigour and experience. Moreover, interpretation of WBS scans in the early stages of the disorders might be challenging because the patterns often reflect normal appearance that is prone to subjective interpretation. To simplify the gruelling, subjective, and prone-to-error task of interpreting WBS scans, we developed deep learning (DL) models to automate two major analyses, namely (i) classification of scans into normal and abnormal and (ii) discrimination between malignant and non-neoplastic bone diseases, and compared their performance with human observers.
Materials and Methods
After applying our exclusion criteria on 7188 patients from three different centers, 3772 and 2248 patients were enrolled for the first and second analyses, respectively. Data were split into two parts, including training and testing, while a fraction of training data were considered for validation. Ten different CNN models were applied to single- and dual-view input (posterior and anterior views) modes to find the optimal model for each analysis. In addition, three different methods, including squeeze-and-excitation (SE), spatial pyramid pooling (SPP), and attention-augmented (AA), were used to aggregate the features for dual-view input models. Model performance was reported through area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, and specificity and was compared with the DeLong test applied to ROC curves. The test dataset was evaluated by three nuclear medicine physicians (NMPs) with different levels of experience to compare the performance of AI and human observers.
Results
DenseNet121_AA (DensNet121, with dual-view input aggregated by AA) and InceptionResNetV2_SPP achieved the highest performance (AUC = 0.72) for the first and second analyses, respectively. Moreover, on average, in the first analysis, Inception V3 and InceptionResNetV2 CNN models and dual-view input with AA aggregating method had superior performance. In addition, in the second analysis, DenseNet121 and InceptionResNetV2 as CNN methods and dual-view input with AA aggregating method achieved the best results. Conversely, the performance of AI models was significantly higher than human observers for the first analysis, whereas their performance was comparable in the second analysis, although the AI model assessed the scans in a drastically lower time.
Conclusion
Using the models designed in this study, a positive step can be taken toward improving and optimizing WBS interpretation. By training DL models with larger and more diverse cohorts, AI could potentially be used to assist physicians in the assessment of WBS images.