Nevzat Yeşilmen, Çağla Danacı, Merve Parlak Baydoğan, Seda Arslan Tuncer, Ahmet Çınar, Taner Tuncer
{"title":"Enhanced Vision Transformer with Custom Attention Mechanism for Automated Idiopathic Scoliosis Classification.","authors":"Nevzat Yeşilmen, Çağla Danacı, Merve Parlak Baydoğan, Seda Arslan Tuncer, Ahmet Çınar, Taner Tuncer","doi":"10.1007/s10278-025-01564-w","DOIUrl":null,"url":null,"abstract":"<p><p>Scoliosis is a three-dimensional spinal deformity that is the most common among spinal deformities and causes extremely serious posture disorders in advanced stages. Scoliosis can lead to various health problems, including pain, respiratory dysfunction, heart problems, mental health disorders, stress, and emotional difficulties. The current gold standard for grading scoliosis and planning treatment is based on the Cobb angle measurement on X-rays. The Cobb angle measurement is performed by physical medicine and rehabilitation specialists, orthopedists, radiologists, etc., in branches dealing with the musculoskeletal system. Manual calculation of the Cobb angle for this process is subjective and takes more time. Deep learning-based systems that can evaluate the Cobb angle objectively have been frequently used recently. In this article, we propose an enhanced ViT that allows doctors to evaluate the diagnosis of scoliosis more objectively without wasting time. The proposed model uses a custom attention mechanism instead of the standard multi-head attention mechanism for the ViT model. A dataset with 7 different classes was obtained from 1456 patients in total from Elazığ Fethi Sekin City Hospital Physical Medicine and Rehabilitation Clinic. Multiple models were used to compare the proposed architecture in the classification of scoliosis disease. The proposed improved ViT architecture exhibited the best performance with 95.21% accuracy. This result shows that a superior classification success was achieved compared to ResNet50, Swin Transformer, and standard ViT models.</p>","PeriodicalId":516858,"journal":{"name":"Journal of imaging informatics in medicine","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of imaging informatics in medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s10278-025-01564-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Scoliosis is a three-dimensional spinal deformity that is the most common among spinal deformities and causes extremely serious posture disorders in advanced stages. Scoliosis can lead to various health problems, including pain, respiratory dysfunction, heart problems, mental health disorders, stress, and emotional difficulties. The current gold standard for grading scoliosis and planning treatment is based on the Cobb angle measurement on X-rays. The Cobb angle measurement is performed by physical medicine and rehabilitation specialists, orthopedists, radiologists, etc., in branches dealing with the musculoskeletal system. Manual calculation of the Cobb angle for this process is subjective and takes more time. Deep learning-based systems that can evaluate the Cobb angle objectively have been frequently used recently. In this article, we propose an enhanced ViT that allows doctors to evaluate the diagnosis of scoliosis more objectively without wasting time. The proposed model uses a custom attention mechanism instead of the standard multi-head attention mechanism for the ViT model. A dataset with 7 different classes was obtained from 1456 patients in total from Elazığ Fethi Sekin City Hospital Physical Medicine and Rehabilitation Clinic. Multiple models were used to compare the proposed architecture in the classification of scoliosis disease. The proposed improved ViT architecture exhibited the best performance with 95.21% accuracy. This result shows that a superior classification success was achieved compared to ResNet50, Swin Transformer, and standard ViT models.