Development and evaluation of deep learning models for detecting and classifying various bone tumours in full-field limb radiographs using automated object detection models.
{"title":"Development and evaluation of deep learning models for detecting and classifying various bone tumours in full-field limb radiographs using automated object detection models.","authors":"Masashi Yamana, Ryoma Bise, Makoto Endo, Tomoya Matsunobu, Nokitaka Setsu, Nobuhiko Yokoyama, Yasuharu Nakashima, Seiichi Uchida","doi":"10.1302/2046-3758.149.BJR-2024-0505.R1","DOIUrl":null,"url":null,"abstract":"<p><strong>Aims: </strong>We aim to develop a fully automated deep-learning model to detect and classify benign/malignant bone tumours in full-field limb radiographs using an object detection model. The secondary aim is to identify differences in classification characteristics between the developed automated model, three orthopaedic oncologists, and three general orthopaedic surgeons.</p><p><strong>Methods: </strong>This retrospective analysis included 642 limb bone tumours with 40 diagnoses confirmed pathologically from three institutions (378 benign, 264 malignant including intermediate types). We employed end-to-end object Detection with transformers with Improved deNoising anchOr boxes (DINO) and You Only Look Once (YOLO) models. We performed five-fold cross validation on the collected radiographs, using the training data to train the models, validation data to optimize the models' parameters, and independent test data for final performance evaluation. Firstly, we confirmed DINO achieves a higher detection rate than YOLO. Secondly, we compared the classification performance of DINO with those of doctors, using various metrics such as accuracy, sensitivity, specificity, precision, and F-measure.</p><p><strong>Results: </strong>The DINO model achieved a higher mean tumour detection rate (85.7% (95% CI 81.5 to 89.8)) than the YOLO model (80.1% (95% CI 77.2 to 82.9)). For the evaluation of classification performance, we used 113 cases that DINO detected out of 128 randomly selected cases as the evaluation test set. The accuracy and sensitivity of the DINO model, as a superior model, were significantly higher than those of general orthopaedic surgeons. The DINO model correctly classified 78.6% (22 out of 28 cases) of the challenging cases that two or more doctors misclassified. However, DINO's diagnostic errors primarily occurred with tumours that were diagnostically challenging for orthopaedic oncologists or present in unusual sites.</p><p><strong>Conclusion: </strong>The DINO model automatically detects bone tumours better than the YOLO model, and may assist doctors in detecting tumours and classifying malignant/benign bone tumours in clinical practice.</p>","PeriodicalId":9074,"journal":{"name":"Bone & Joint Research","volume":"14 9","pages":"760-768"},"PeriodicalIF":5.1000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12401592/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bone & Joint Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1302/2046-3758.149.BJR-2024-0505.R1","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CELL & TISSUE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Aims: We aim to develop a fully automated deep-learning model to detect and classify benign/malignant bone tumours in full-field limb radiographs using an object detection model. The secondary aim is to identify differences in classification characteristics between the developed automated model, three orthopaedic oncologists, and three general orthopaedic surgeons.
Methods: This retrospective analysis included 642 limb bone tumours with 40 diagnoses confirmed pathologically from three institutions (378 benign, 264 malignant including intermediate types). We employed end-to-end object Detection with transformers with Improved deNoising anchOr boxes (DINO) and You Only Look Once (YOLO) models. We performed five-fold cross validation on the collected radiographs, using the training data to train the models, validation data to optimize the models' parameters, and independent test data for final performance evaluation. Firstly, we confirmed DINO achieves a higher detection rate than YOLO. Secondly, we compared the classification performance of DINO with those of doctors, using various metrics such as accuracy, sensitivity, specificity, precision, and F-measure.
Results: The DINO model achieved a higher mean tumour detection rate (85.7% (95% CI 81.5 to 89.8)) than the YOLO model (80.1% (95% CI 77.2 to 82.9)). For the evaluation of classification performance, we used 113 cases that DINO detected out of 128 randomly selected cases as the evaluation test set. The accuracy and sensitivity of the DINO model, as a superior model, were significantly higher than those of general orthopaedic surgeons. The DINO model correctly classified 78.6% (22 out of 28 cases) of the challenging cases that two or more doctors misclassified. However, DINO's diagnostic errors primarily occurred with tumours that were diagnostically challenging for orthopaedic oncologists or present in unusual sites.
Conclusion: The DINO model automatically detects bone tumours better than the YOLO model, and may assist doctors in detecting tumours and classifying malignant/benign bone tumours in clinical practice.