Joonas Liedes, Jussi Hirvonen, Oona Rainio, Sarita Murtojärvi, Simona Malaspina, Riku Klén, Jukka Kemppainen
{"title":"Deep learning-based 3D classification of head and neck cancer PET/MRI: Radiologist comparison and Grad-CAM interpretability","authors":"Joonas Liedes, Jussi Hirvonen, Oona Rainio, Sarita Murtojärvi, Simona Malaspina, Riku Klén, Jukka Kemppainen","doi":"10.1111/cpf.70030","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Purpose</h3>\n \n <p>To develop and evaluate a three-dimensional convolutional neural network for automated classification of PET/MRI images in head and neck cancer (HNC) patients, assessing its performance against radiologist interpretation and its potential as a diagnostic aid.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Data from 202 patients with HNC who underwent <sup>18</sup>F-FDG PET/MRI were used to train and validate PET-, MRI-, and PET/MRI-based models. Of these data, 101 patients were labelled as positive in terms of having HNC, and 101 patients as negative. An additional test set of 20 patients was also evaluated, where 10 patients were labelled as positive and 10 as negative. The model performance was assessed using sensitivity, specificity, accuracy, and AUC. Grad-CAM was utilised to improve interpretability and classification results on the test set were compared with a radiologist.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>The PET-based model achieved an AUC of 0.92 on the test set, with an accuracy of 90%, a sensitivity of 100% and a specificity of 80%. PET/MRI and MRI-based models underperformed relative to the PET-based model. The radiologist achieved perfect classification accuracy. Analysis of Grad-CAM showed that the model classifications are based on real areas of interest. In addition, it gave valuable insight into using similar systems in identifying false positive findings.</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>The PET-based model demonstrated high sensitivity, indicating its potential as a pre-screening tool for HNC. However, specificity requires improvement to reduce false-positive rates. Enhanced datasets and refinement of model architecture will be crucial before clinical adoption. Grad-CAM provides valuable insights into model decisions, aiding clinical integration.</p>\n </section>\n </div>","PeriodicalId":10504,"journal":{"name":"Clinical Physiology and Functional Imaging","volume":"45 5","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/cpf.70030","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical Physiology and Functional Imaging","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/cpf.70030","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PHYSIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
To develop and evaluate a three-dimensional convolutional neural network for automated classification of PET/MRI images in head and neck cancer (HNC) patients, assessing its performance against radiologist interpretation and its potential as a diagnostic aid.
Methods
Data from 202 patients with HNC who underwent 18F-FDG PET/MRI were used to train and validate PET-, MRI-, and PET/MRI-based models. Of these data, 101 patients were labelled as positive in terms of having HNC, and 101 patients as negative. An additional test set of 20 patients was also evaluated, where 10 patients were labelled as positive and 10 as negative. The model performance was assessed using sensitivity, specificity, accuracy, and AUC. Grad-CAM was utilised to improve interpretability and classification results on the test set were compared with a radiologist.
Results
The PET-based model achieved an AUC of 0.92 on the test set, with an accuracy of 90%, a sensitivity of 100% and a specificity of 80%. PET/MRI and MRI-based models underperformed relative to the PET-based model. The radiologist achieved perfect classification accuracy. Analysis of Grad-CAM showed that the model classifications are based on real areas of interest. In addition, it gave valuable insight into using similar systems in identifying false positive findings.
Conclusion
The PET-based model demonstrated high sensitivity, indicating its potential as a pre-screening tool for HNC. However, specificity requires improvement to reduce false-positive rates. Enhanced datasets and refinement of model architecture will be crucial before clinical adoption. Grad-CAM provides valuable insights into model decisions, aiding clinical integration.
期刊介绍:
Clinical Physiology and Functional Imaging publishes reports on clinical and experimental research pertinent to human physiology in health and disease. The scope of the Journal is very broad, covering all aspects of the regulatory system in the cardiovascular, renal and pulmonary systems with special emphasis on methodological aspects. The focus for the journal is, however, work that has potential clinical relevance. The Journal also features review articles on recent front-line research within these fields of interest.
Covered by the major abstracting services including Current Contents and Science Citation Index, Clinical Physiology and Functional Imaging plays an important role in providing effective and productive communication among clinical physiologists world-wide.