MBTC-Net：利用具有多头注意机制的深度神经网络对CT和MRI扫描的多模态脑肿瘤进行分类

Q3 Medicine

Medicine in Novel Technology and Devices Pub Date : 2025-06-30 DOI:10.1016/j.medntd.2025.100382

Satrajit Kar, Pawan Kumar Singh

{"title":"MBTC-Net：利用具有多头注意机制的深度神经网络对CT和MRI扫描的多模态脑肿瘤进行分类","authors":"Satrajit Kar, Pawan Kumar Singh","doi":"10.1016/j.medntd.2025.100382","DOIUrl":null,"url":null,"abstract":"<div><div>Brain tumors pose a singularly formidable threat in contemporary healthcare due to their diverse histological profiles and unpredictable clinical behavior. Their spectrum ranges from slow-growing benign tumors to highly aggressive malignancies in sensitive anatomical locations. This necessitates an intensified focus on their pathophysiology and demands precise characterization for patient-specific therapeutic solutions. Techniques to correctly identify brain tumors using artificial intelligence are often employed for addressing segmentation and detection tasks; however, the lack of generalizable results hinders medical practitioners from incorporating them into the diagnostic process. Predominantly reliant on Magnetic Resonance Imaging, research on other imaging methods like Positron Emission Tomography & Computed Tomography, is scarce due to a dearth of open-access datasets. Our study proposes a robust MBTC-Net framework by leveraging EfficientNetV2B0 for extracting high-dimensional feature maps, followed by reshaping into sequences and applying multi-head attention to capture contextual dependencies. After reintroducing the attention output into a spatial structure, we perform average pooling before transitioning to dense layers, enhanced with batch normalization and dropout. The model is fine-tuned with the Adamax optimizer to classify various kinds of brain tumors using softmax from T1-weighted, T1 Contrast-Enhanced, & T2-weighted MRI sequences and CT scans. To reduce the risk of overfitting, measures such as stratified 5-fold cross-validation have been extensively implemented across 3 open-access Kaggle datasets, obtaining 97.54 % (15-class), 97.97 % (6-class), and 99.34 % (2-class) accuracies, respectively. We have also applied Grad-CAM to decipher and visually analyze the predictions made by this framework. This research underscores the need for multimodal training of CT scans and MRI sequences for deploying a sturdy framework in real-time environments and advancing the well-being of patients.</div></div>","PeriodicalId":33783,"journal":{"name":"Medicine in Novel Technology and Devices","volume":"27 ","pages":"Article 100382"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MBTC-Net: Multimodal brain tumor classification from CT and MRI scans using deep neural network with multi-head attention mechanism\",\"authors\":\"Satrajit Kar, Pawan Kumar Singh\",\"doi\":\"10.1016/j.medntd.2025.100382\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Brain tumors pose a singularly formidable threat in contemporary healthcare due to their diverse histological profiles and unpredictable clinical behavior. Their spectrum ranges from slow-growing benign tumors to highly aggressive malignancies in sensitive anatomical locations. This necessitates an intensified focus on their pathophysiology and demands precise characterization for patient-specific therapeutic solutions. Techniques to correctly identify brain tumors using artificial intelligence are often employed for addressing segmentation and detection tasks; however, the lack of generalizable results hinders medical practitioners from incorporating them into the diagnostic process. Predominantly reliant on Magnetic Resonance Imaging, research on other imaging methods like Positron Emission Tomography & Computed Tomography, is scarce due to a dearth of open-access datasets. Our study proposes a robust MBTC-Net framework by leveraging EfficientNetV2B0 for extracting high-dimensional feature maps, followed by reshaping into sequences and applying multi-head attention to capture contextual dependencies. After reintroducing the attention output into a spatial structure, we perform average pooling before transitioning to dense layers, enhanced with batch normalization and dropout. The model is fine-tuned with the Adamax optimizer to classify various kinds of brain tumors using softmax from T1-weighted, T1 Contrast-Enhanced, & T2-weighted MRI sequences and CT scans. To reduce the risk of overfitting, measures such as stratified 5-fold cross-validation have been extensively implemented across 3 open-access Kaggle datasets, obtaining 97.54 % (15-class), 97.97 % (6-class), and 99.34 % (2-class) accuracies, respectively. We have also applied Grad-CAM to decipher and visually analyze the predictions made by this framework. This research underscores the need for multimodal training of CT scans and MRI sequences for deploying a sturdy framework in real-time environments and advancing the well-being of patients.</div></div>\",\"PeriodicalId\":33783,\"journal\":{\"name\":\"Medicine in Novel Technology and Devices\",\"volume\":\"27 \",\"pages\":\"Article 100382\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medicine in Novel Technology and Devices\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590093525000335\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medicine in Novel Technology and Devices","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590093525000335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}

引用次数: 0

摘要

脑肿瘤由于其多样的组织学特征和不可预测的临床行为，在当代医疗保健中构成了一个非常强大的威胁。其范围从生长缓慢的良性肿瘤到敏感解剖部位的高度侵袭性恶性肿瘤。这就需要加强对其病理生理学的关注，并要求对患者特异性治疗方案进行精确的表征。使用人工智能正确识别脑肿瘤的技术通常用于解决分割和检测任务；然而，缺乏可推广的结果阻碍了医生将其纳入诊断过程。主要依靠磁共振成像，研究其他成像方法，如正电子发射断层扫描和；由于缺乏开放获取的数据集，计算机断层扫描是稀缺的。我们的研究提出了一个强大的MBTC-Net框架，利用EfficientNetV2B0提取高维特征图，然后将其重塑成序列，并应用多头注意力来捕获上下文依赖关系。在将注意力输出重新引入空间结构后，我们在过渡到密集层之前执行平均池化，并通过批处理归一化和dropout进行增强。该模型使用Adamax优化器进行微调，使用softmax从T1加权，T1对比增强，&；t2加权MRI序列和CT扫描。为了降低过拟合的风险，在3个开放获取的Kaggle数据集上广泛实施了分层5倍交叉验证等措施，分别获得了97.54%（15类）、97.97%（6类）和99.34%（2类）的准确率。我们还使用Grad-CAM对该框架所做的预测进行解码和可视化分析。这项研究强调了对CT扫描和MRI序列进行多模式训练的必要性，以便在实时环境中部署一个坚固的框架，并提高患者的福祉。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MBTC-Net: Multimodal brain tumor classification from CT and MRI scans using deep neural network with multi-head attention mechanism

Brain tumors pose a singularly formidable threat in contemporary healthcare due to their diverse histological profiles and unpredictable clinical behavior. Their spectrum ranges from slow-growing benign tumors to highly aggressive malignancies in sensitive anatomical locations. This necessitates an intensified focus on their pathophysiology and demands precise characterization for patient-specific therapeutic solutions. Techniques to correctly identify brain tumors using artificial intelligence are often employed for addressing segmentation and detection tasks; however, the lack of generalizable results hinders medical practitioners from incorporating them into the diagnostic process. Predominantly reliant on Magnetic Resonance Imaging, research on other imaging methods like Positron Emission Tomography & Computed Tomography, is scarce due to a dearth of open-access datasets. Our study proposes a robust MBTC-Net framework by leveraging EfficientNetV2B0 for extracting high-dimensional feature maps, followed by reshaping into sequences and applying multi-head attention to capture contextual dependencies. After reintroducing the attention output into a spatial structure, we perform average pooling before transitioning to dense layers, enhanced with batch normalization and dropout. The model is fine-tuned with the Adamax optimizer to classify various kinds of brain tumors using softmax from T1-weighted, T1 Contrast-Enhanced, & T2-weighted MRI sequences and CT scans. To reduce the risk of overfitting, measures such as stratified 5-fold cross-validation have been extensively implemented across 3 open-access Kaggle datasets, obtaining 97.54 % (15-class), 97.97 % (6-class), and 99.34 % (2-class) accuracies, respectively. We have also applied Grad-CAM to decipher and visually analyze the predictions made by this framework. This research underscores the need for multimodal training of CT scans and MRI sequences for deploying a sturdy framework in real-time environments and advancing the well-being of patients.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Medicine in Novel Technology and Devices Medicine-Medicine (miscellaneous)

CiteScore

3.00

自引率

0.00%

发文量

审稿时长

64 days