{"title":"MBTC-Net: Multimodal brain tumor classification from CT and MRI scans using deep neural network with multi-head attention mechanism","authors":"Satrajit Kar, Pawan Kumar Singh","doi":"10.1016/j.medntd.2025.100382","DOIUrl":null,"url":null,"abstract":"<div><div>Brain tumors pose a singularly formidable threat in contemporary healthcare due to their diverse histological profiles and unpredictable clinical behavior. Their spectrum ranges from slow-growing benign tumors to highly aggressive malignancies in sensitive anatomical locations. This necessitates an intensified focus on their pathophysiology and demands precise characterization for patient-specific therapeutic solutions. Techniques to correctly identify brain tumors using artificial intelligence are often employed for addressing segmentation and detection tasks; however, the lack of generalizable results hinders medical practitioners from incorporating them into the diagnostic process. Predominantly reliant on Magnetic Resonance Imaging, research on other imaging methods like Positron Emission Tomography & Computed Tomography, is scarce due to a dearth of open-access datasets. Our study proposes a robust MBTC-Net framework by leveraging EfficientNetV2B0 for extracting high-dimensional feature maps, followed by reshaping into sequences and applying multi-head attention to capture contextual dependencies. After reintroducing the attention output into a spatial structure, we perform average pooling before transitioning to dense layers, enhanced with batch normalization and dropout. The model is fine-tuned with the Adamax optimizer to classify various kinds of brain tumors using softmax from T1-weighted, T1 Contrast-Enhanced, & T2-weighted MRI sequences and CT scans. To reduce the risk of overfitting, measures such as stratified 5-fold cross-validation have been extensively implemented across 3 open-access Kaggle datasets, obtaining 97.54 % (15-class), 97.97 % (6-class), and 99.34 % (2-class) accuracies, respectively. We have also applied Grad-CAM to decipher and visually analyze the predictions made by this framework. This research underscores the need for multimodal training of CT scans and MRI sequences for deploying a sturdy framework in real-time environments and advancing the well-being of patients.</div></div>","PeriodicalId":33783,"journal":{"name":"Medicine in Novel Technology and Devices","volume":"27 ","pages":"Article 100382"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicine in Novel Technology and Devices","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590093525000335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Brain tumors pose a singularly formidable threat in contemporary healthcare due to their diverse histological profiles and unpredictable clinical behavior. Their spectrum ranges from slow-growing benign tumors to highly aggressive malignancies in sensitive anatomical locations. This necessitates an intensified focus on their pathophysiology and demands precise characterization for patient-specific therapeutic solutions. Techniques to correctly identify brain tumors using artificial intelligence are often employed for addressing segmentation and detection tasks; however, the lack of generalizable results hinders medical practitioners from incorporating them into the diagnostic process. Predominantly reliant on Magnetic Resonance Imaging, research on other imaging methods like Positron Emission Tomography & Computed Tomography, is scarce due to a dearth of open-access datasets. Our study proposes a robust MBTC-Net framework by leveraging EfficientNetV2B0 for extracting high-dimensional feature maps, followed by reshaping into sequences and applying multi-head attention to capture contextual dependencies. After reintroducing the attention output into a spatial structure, we perform average pooling before transitioning to dense layers, enhanced with batch normalization and dropout. The model is fine-tuned with the Adamax optimizer to classify various kinds of brain tumors using softmax from T1-weighted, T1 Contrast-Enhanced, & T2-weighted MRI sequences and CT scans. To reduce the risk of overfitting, measures such as stratified 5-fold cross-validation have been extensively implemented across 3 open-access Kaggle datasets, obtaining 97.54 % (15-class), 97.97 % (6-class), and 99.34 % (2-class) accuracies, respectively. We have also applied Grad-CAM to decipher and visually analyze the predictions made by this framework. This research underscores the need for multimodal training of CT scans and MRI sequences for deploying a sturdy framework in real-time environments and advancing the well-being of patients.