Biomedical Signal Processing and Control最新文献

筛选
英文 中文
A learnable transformer decoder for blurred near-infrared blood vessel segmentation using domain adaptation with limited data 基于有限数据域自适应的模糊近红外血管分割可学习变压器解码器
IF 4.9 2区 医学
Biomedical Signal Processing and Control Pub Date : 2025-07-23 DOI: 10.1016/j.bspc.2025.108335
Jiazhe Wang , Yuya Ieiri , Osamu Yoshie , Koichi Shimizu
{"title":"A learnable transformer decoder for blurred near-infrared blood vessel segmentation using domain adaptation with limited data","authors":"Jiazhe Wang ,&nbsp;Yuya Ieiri ,&nbsp;Osamu Yoshie ,&nbsp;Koichi Shimizu","doi":"10.1016/j.bspc.2025.108335","DOIUrl":"10.1016/j.bspc.2025.108335","url":null,"abstract":"<div><div>Transillumination imaging using near-infrared (NIR) light is an effective method for visualizing subcutaneous blood vessels. However, as the vessel depth increases, images become severely blurred due to strong light scattering in body tissue. Although deep learning has shown great promise in clear vessel segmentation, many models face challenges achieving accurate segmentation on blurred NIR blood vessels, due to limited training data and the non-learnability of bilinear upsampling. To address these challenges, in this study, a new decoder model named the DeMerge Transformer (DMTrans) is introduced. This model uses a new learnable upsampling approach that can adaptively enlarge useful information from the encoder pretrained in related fields, maintaining accurate results with limited data during fine-tuning. The DMTrans model includes both a DeMerging (DM) operation and a channel transformer mechanism. The DM operation also leverages channel information, which becomes more sufficient as the model layers deepen, to initialize spatial features. The transformer reconstructs the DM features to fuse global information and long-range dependencies, facilitating a dense, learnable upsampling process. The proposed method was evaluated on five blood vessel datasets (DRIVE, STARE, CHUAC, CHASE_DB1, and HRF) and an NIR transillumination image dataset (HV_NIR). The proposed method achieved an average 1.89% improvement in the Dice score across these benchmarks. The results demonstrate that the DMTrans model not only offers stronger capabilities for transferring prior knowledge but also proves to be an effective approach for datasets that are small in scale but require high precision.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108335"},"PeriodicalIF":4.9,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of hyperglycemia and hypoglycemia using deep learning from facial images obtained with an AI image generator 利用人工智能图像生成器获得的面部图像进行深度学习,检测高血糖和低血糖
IF 4.9 2区 医学
Biomedical Signal Processing and Control Pub Date : 2025-07-23 DOI: 10.1016/j.bspc.2025.108351
Hidir Selcuk Nogay , Nalan Hakime Nogay , Hojjat Adeli
{"title":"Detection of hyperglycemia and hypoglycemia using deep learning from facial images obtained with an AI image generator","authors":"Hidir Selcuk Nogay ,&nbsp;Nalan Hakime Nogay ,&nbsp;Hojjat Adeli","doi":"10.1016/j.bspc.2025.108351","DOIUrl":"10.1016/j.bspc.2025.108351","url":null,"abstract":"<div><div>The identification of hyperglycemia and hypoglycemia is paramount in diabetes care, facilitating prompt interventions to mitigate potential health complications. A novel method is introduced for identifying glycemic states using deep learning from facial images generated by artificial intelligence (AI). Specifically, the EfficientB0 model—a pre-trained convolutional neural network (CNN)—is employed, utilizing the transfer learning technique to leverage its learned features for glycemic state classification. The proposed method offers a non-invasive and remote monitoring solution, allowing for convenient glycemic status assessment without the need for invasive procedures or continuous glucose monitoring devices. The experimental results confirm the effectiveness of the proposed method. The achieved accuracy rates, recall rates, and F1-scores validate the model’s ability to accurately identify individuals at risk of glycemic abnormalities. The integration of deep learning techniques with facial image analysis holds promise for personalized healthcare solutions tailored to individuals with diabetes, facilitating early detection and intervention for improved glycemic control. By leveraging AI-driven facial image analysis, individuals with diabetes can benefit from early detection and prediction of hyperglycemic and hypoglycemic events, enabling timely interventions and adjustments in treatment regimens. This approach holds promise for improving glycemic control, reducing the risk of acute complications, and enhancing overall quality of life for individuals with diabetes. The non-invasive approach for detecting glycemic states presented in this paper has the potential to revolutionize healthcare management for individuals with diabetes.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108351"},"PeriodicalIF":4.9,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EPVM: An efficient privacy-preserving palm vein model for user authentication EPVM:一种高效的隐私保护掌静脉用户认证模型
IF 4.9 2区 医学
Biomedical Signal Processing and Control Pub Date : 2025-07-23 DOI: 10.1016/j.bspc.2025.108287
Divya Singla, Neetu Verma
{"title":"EPVM: An efficient privacy-preserving palm vein model for user authentication","authors":"Divya Singla,&nbsp;Neetu Verma","doi":"10.1016/j.bspc.2025.108287","DOIUrl":"10.1016/j.bspc.2025.108287","url":null,"abstract":"<div><div>This research introduces a revolutionary palm vein authentication mechanism that achieves a significant level of accuracy and privacy. To achieve accuracy, a novel mix of preprocessing techniques (ROI, modified adaptive filter, Histogram equalization, skeletonization) are applied on various classification models (Decision tree, Random Forest, Principal Component Analysis-Support Vector Machine (PCA-SVM), Principal Component Analysis- Random Forest (PCA-RF), Partial Least Square-Regression (PLS-R)). The experimental findings demonstrate that PLS-R outperforms other models. For further improvement, PLS-R model is expanded and iterated over several components to find the best performing model based on certain criteria (e.g., accuracy, variance explained). To attain privacy, distance matrix of optimized PLS-R model is calculated during the training phase and generated secured template for authentication. Compared with state-of-the-art technique, our technique can ensure accuracy and privacy in biometric authentication by using feature reduction, distance-matrix computation, and threshold-based decision making for identity verification. Overall, the proposed model provides a secure authentication mechanism with high accuracy, and the privacy preservation of biometric traits.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108287"},"PeriodicalIF":4.9,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144695440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive thresholding of DWT coefficients using UNet for denoising real-life respiratory sounds 基于UNet的自适应阈值DWT系数去噪方法
IF 4.9 2区 医学
Biomedical Signal Processing and Control Pub Date : 2025-07-23 DOI: 10.1016/j.bspc.2025.108232
Subhasikta Behera, Puneet Kumar Jain, Sambit Bakshi
{"title":"Adaptive thresholding of DWT coefficients using UNet for denoising real-life respiratory sounds","authors":"Subhasikta Behera,&nbsp;Puneet Kumar Jain,&nbsp;Sambit Bakshi","doi":"10.1016/j.bspc.2025.108232","DOIUrl":"10.1016/j.bspc.2025.108232","url":null,"abstract":"<div><div>Early diagnosis of respiratory diseases through sound analysis is often challenged by noise from various internal and external sources. Discrete wavelet transform (DWT)-based denoising has shown significant potential for noise suppression, but its effectiveness heavily depends on the selection of thresholding parameters. To address this, we introduce a novel adaptive thresholding approach of the DWT coefficients using a UNet model for respiratory sound denoising. The UNet maintains spatial localization while integrating contextual information across different levels, allowing precise and context-aware thresholding that effectively suppresses noise while preserving signal components. Combining the multi-resolution capabilities of DWT with UNet’s multi-scale feature extraction, the method demonstrates its robustness and efficacy on two public datasets with various noise types and levels, outperforming traditional methods. The method achieved a 2.0 dB higher SNR than the second-best result, leading to a 3.45% improvement in sound event classification accuracy under real-life noise conditions. These results highlight its potential for deployment in rural and remote areas. The code for the model is available at <span><span>https://github.com/subhasikta/DWT-UNet_denoising_model</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108232"},"PeriodicalIF":4.9,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel multi-scale context aggregation and feature pooling network for Mpox classification 一种新的多尺度上下文聚合和特征池化网络用于Mpox分类
IF 4.9 2区 医学
Biomedical Signal Processing and Control Pub Date : 2025-07-22 DOI: 10.1016/j.bspc.2025.108254
Mehdhar S.A.M. Al-Gaashani , Abduljabbar S. Ba Mahel , Mashael M. Khayyat , Ammar Muthanna
{"title":"A novel multi-scale context aggregation and feature pooling network for Mpox classification","authors":"Mehdhar S.A.M. Al-Gaashani ,&nbsp;Abduljabbar S. Ba Mahel ,&nbsp;Mashael M. Khayyat ,&nbsp;Ammar Muthanna","doi":"10.1016/j.bspc.2025.108254","DOIUrl":"10.1016/j.bspc.2025.108254","url":null,"abstract":"<div><div>Mpox, previously known as monkeypox, poses a growing global health threat due to its rising incidence. Rapid and accurate identification of Mpox lesions is crucial, especially in resource-limited settings where traditional diagnostics face delays and demand specialized resources. This study introduces a deep learning model that leverages MobileNetV2, a Multi-Scale Context Aggregator (MSCA), and a Feature Pooling Block to improve Mpox detection using medical images. The MSCA module employs dilated convolutions and global pooling to capture multi-scale features, while the Feature Pooling Block enhances spatial and channel dependencies, achieving refined feature representation. This architecture maintains computational efficiency, making it suitable for deployment in low-resource settings. Evaluated on four diverse datasets, the model achieved high performance: MSLDV1 recorded 93.62% accuracy and 94.28% precision; MSLDV2 reached 100% accuracy and precision; MSID reported 96.15% accuracy and 96.13% precision; and the self-collected dataset achieved 98.80% accuracy and precision. These results underscore the model’s superior accuracy and generalization, positioning it as a promising solution for Mpox classification in clinical and research applications.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108254"},"PeriodicalIF":4.9,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144678748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TUTNet: Leveraging transformers for comprehensive extraction and preservation of global information in medical image segmentation TUTNet:利用变压器在医学图像分割中全面提取和保存全局信息
IF 4.9 2区 医学
Biomedical Signal Processing and Control Pub Date : 2025-07-22 DOI: 10.1016/j.bspc.2025.108277
Minquan Zhao , Jiahe Yu , Hui Qi , Qin Gu , Miao Wang , Yaduan Ruan
{"title":"TUTNet: Leveraging transformers for comprehensive extraction and preservation of global information in medical image segmentation","authors":"Minquan Zhao ,&nbsp;Jiahe Yu ,&nbsp;Hui Qi ,&nbsp;Qin Gu ,&nbsp;Miao Wang ,&nbsp;Yaduan Ruan","doi":"10.1016/j.bspc.2025.108277","DOIUrl":"10.1016/j.bspc.2025.108277","url":null,"abstract":"<div><div>In the field of medical image segmentation, the introduction of Transformers to assist in medical image segmentation has proven to be effective in recent years. Compared to traditional CNN methods, Transformers have excellent global context information extraction capabilities. However, using a purely Transformer-based architecture for image segmentation not only increases the overall network parameters but also decreases the ability to extract local features. The accurate extraction and integration of local and global features are crucial for achieving medical image segmentation. Therefore, we propose TUTNet, an improved medical image segmentation network. TUTNet features a dual encoder architecture, consisting of a CNN-based encoder and a Transformer-based encoder. The former effectively extracts local information, while the latter captures global information. Skip connections simultaneously receive information from both encoders, eliminating semantic differences between the two encoders through a cross-attention mechanism. Skip connections are also of high importance for medical image segmentation. In our network, the skip connections utilize a purely Transformer mechanism, fully leveraging the information extracted by encoders at all levels, utilizing a self-attention mechanism to focus on channel information, and then employing a cross-attention mechanism to query and filter global information with local information, thus extracting valuable information for decoders at each level. We conducted extensive validation on four datasets, demonstrating the effectiveness of our network across different feature datasets. Our work can be viewed on <span><span>https://github.com/whycantChinese/TUTNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108277"},"PeriodicalIF":4.9,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144678750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerating biological imaging of atomic force microscopy by deep-learning 通过深度学习加速原子力显微镜的生物成像
IF 4.9 2区 医学
Biomedical Signal Processing and Control Pub Date : 2025-07-22 DOI: 10.1016/j.bspc.2025.108220
Haiyue Yu , Nan Li , Feifan Yao , Baichuan Wang , Hangze Song , Zuobin Wang
{"title":"Accelerating biological imaging of atomic force microscopy by deep-learning","authors":"Haiyue Yu ,&nbsp;Nan Li ,&nbsp;Feifan Yao ,&nbsp;Baichuan Wang ,&nbsp;Hangze Song ,&nbsp;Zuobin Wang","doi":"10.1016/j.bspc.2025.108220","DOIUrl":"10.1016/j.bspc.2025.108220","url":null,"abstract":"<div><div>Atomic Force Microscopy is widely used for surface imaging at the nanoscale. However, we need slower scanning speeds to obtain high-quality images, which often introduce noise and sample drift, causing distortion. In addition, the large-scale scanning of high-resolution images by AFM consumes a lot of sample resources and time. Although the traditional bicubic method can achieve the generation of high-resolution images, it cannot restore the details of the image well, and even the obtained image is basically different from the original high-resolution image. Here, in this paper, we propose a fast imaging method that combines the scanning data acquisition and the image super-resolution algorithm of the deep learning model, which can quickly generate high-quality images with little noise interference and small drift. High-resolution scans of viral samples in a clean-room environment were combined with higher-order degradation modeling to generate low-and high-resolution paired images. To address the time and cost limitations of HR scanning, we optimize the Real-Enhanced Super-Resolution Generative Network by incorporating a lightweight Dynamic Local and Global Self-Attention Network. In addition, we replace L1 Loss with Smooth L1 Loss to accelerate the convergence and improve the performance. Validation using novel coronavirus and influenza virus samples shows that our approach reduces imaging time by approximately 20-fold while maintaining high-resolution quality. The resulting super-resolution images preserve fine viral morphology with improved clarity and accuracy. It provides help for the classification and image detection of various viruses in the later research.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108220"},"PeriodicalIF":4.9,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144685440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaze-guided vision transformer for chest X-ray image classification 用于胸部x线图像分类的注视引导视觉转换器
IF 4.9 2区 医学
Biomedical Signal Processing and Control Pub Date : 2025-07-22 DOI: 10.1016/j.bspc.2025.108298
Zihui Chen, Zhi Liu, Yingjie Song
{"title":"Gaze-guided vision transformer for chest X-ray image classification","authors":"Zihui Chen,&nbsp;Zhi Liu,&nbsp;Yingjie Song","doi":"10.1016/j.bspc.2025.108298","DOIUrl":"10.1016/j.bspc.2025.108298","url":null,"abstract":"<div><div>Thoracic diseases pose significant public health challenges worldwide, with chest X-rays (CXR) playing a critical role in their diagnosis and monitoring. Recent advances in deep learning have enabled AI-driven diagnostic systems to enhance the accuracy and efficiency of CXR classification. However, the interpretability of these models remains a key concern. We propose a novel approach that integrates expert gaze data into a Vision Transformer (ViT) to improve interpretability in CXR classification. Our method integrates gaze information through the Gaze Information Injector (GII), which directs the model’s attention to clinically relevant regions based on expert knowledge. To reduce overfitting, the GII module compresses gaze features while aligning the model’s focus with expert gaze patterns. To ensure robustness, our training strategy includes both gaze-annotated and non-annotated samples, allowing inference without requiring gaze input. We evaluate our model on the EG-CXR and OMSI datasets, showing improved classification performance compared to state-of-the-art models. Ablation studies validate the effectiveness of the GII module, while visualization results highlight improved interpretability by focusing on disease-related regions in CXR images.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108298"},"PeriodicalIF":4.9,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144678749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IGARN: IVUS Guidewire Artifact Removal Network based on unpaired unsupervised learning IGARN:基于非配对无监督学习的IVUS导丝伪影去除网络
IF 4.9 2区 医学
Biomedical Signal Processing and Control Pub Date : 2025-07-22 DOI: 10.1016/j.bspc.2025.108296
Yucheng Zhao , Jingzhao Sun , Mengqi Wang , Fengming Jia , Jinlin Ren , Jing Zhong , Shujun Fu , Bo Hu
{"title":"IGARN: IVUS Guidewire Artifact Removal Network based on unpaired unsupervised learning","authors":"Yucheng Zhao ,&nbsp;Jingzhao Sun ,&nbsp;Mengqi Wang ,&nbsp;Fengming Jia ,&nbsp;Jinlin Ren ,&nbsp;Jing Zhong ,&nbsp;Shujun Fu ,&nbsp;Bo Hu","doi":"10.1016/j.bspc.2025.108296","DOIUrl":"10.1016/j.bspc.2025.108296","url":null,"abstract":"<div><div>Intravascular ultrasound (IVUS) is a critical tool for cardiovascular examination, which can obtain real-time in vivo cross-sectional images of blood vessels. However, its utility is often compromised by fan-shaped guidewire artifacts generated by the mechanical rotating probe, which obscure vascular structures and hinder diagnostic accuracy and quantitative analysis. Existing methods face challenges due to the complex structure of guidewire artifacts and the lack of clinically relevant paired datasets. To address these issues, this paper introduces a novel framework for guidewire artifact removal. First, a linear model is used to represent guidewire artifacts, departing from the conventional additive model, by considering the imaging principles of guidewire artifacts. Second, a half-cutting strategy is proposed to create unpaired training datasets, consisting of half-images with and without guidewire artifacts, enabling effective unsupervised learning. Third, an IVUS Guidewire Artifact Removal Network (IGARN) is designed based on image-to-image translation. Extensive experiments on synthetic and clinical datasets demonstrate that IGARN outperforms some widely used unsupervised image translation methods in both quantitative and qualitative evaluations. It effectively removes guidewire artifacts while preserving main features in artifact-free areas. Importantly, the proposed method enhances the segmentation of the external elastic membrane (EEM) by effectively removing artifacts and can be integrated into real-time clinical workflows.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108296"},"PeriodicalIF":4.9,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144678747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal multi-scale representation learning via cross-attention between chest radiology images and free-text reports 基于胸部放射图像和自由文本报告交叉注意的多模态多尺度表征学习
IF 4.9 2区 医学
Biomedical Signal Processing and Control Pub Date : 2025-07-22 DOI: 10.1016/j.bspc.2025.108318
Daidi Zhong , Xiaoyu Li , Zhiyong Huang , Shiwei Wang , Zhi Yu , Mingyang Hou , Yan Yan , Yushi Liu
{"title":"Multi-modal multi-scale representation learning via cross-attention between chest radiology images and free-text reports","authors":"Daidi Zhong ,&nbsp;Xiaoyu Li ,&nbsp;Zhiyong Huang ,&nbsp;Shiwei Wang ,&nbsp;Zhi Yu ,&nbsp;Mingyang Hou ,&nbsp;Yan Yan ,&nbsp;Yushi Liu","doi":"10.1016/j.bspc.2025.108318","DOIUrl":"10.1016/j.bspc.2025.108318","url":null,"abstract":"<div><div>Patients routinely generate diverse clinical data, particularly radiological images and their corresponding reports. Integrating these heterogeneous modalities enhances diagnostic accuracy and facilitates the practical application of artificial intelligence in clinical settings. However, the complexity of medical tasks and the specificity of domain knowledge present challenges for deep learning models in effectively leveraging cross-modal information. To address this, we propose the Multi-Modal Multi-Scale Transformer Fusion (MMTF) model, which captures lesion features at multiple spatial scales through complementary cross-modal fusion mechanisms. To further align visual and textual representations, MMTF incorporates two generative tasks during training that help learn the underlying relationships between images and reports. Notably, even with minimal supervision, MMTF outperforms state-of-the-art pre-trained models on four benchmark X-ray datasets. Interpretability analyses show that MMTF accurately highlights lesion regions, supporting clinical decision-making and demonstrating strong potential for real-world deployment. The code is available on GitHub at <span><span>https://github.com/GUESSZERO4/MMTF.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"111 ","pages":"Article 108318"},"PeriodicalIF":4.9,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144686080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信