IET Image Processing最新文献

筛选
英文 中文
Speech2Face3D: A Two-Stage Transfer-Learning Framework for Speech-Driven 3D Facial Animation speech face3d:语音驱动3D面部动画的两阶段迁移学习框架
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-23 DOI: 10.1049/ipr2.70155
Liming Pang, Zhi Zeng, Yahui Li, Guixuan Zhang, Shuwu Zhang
{"title":"Speech2Face3D: A Two-Stage Transfer-Learning Framework for Speech-Driven 3D Facial Animation","authors":"Liming Pang,&nbsp;Zhi Zeng,&nbsp;Yahui Li,&nbsp;Guixuan Zhang,&nbsp;Shuwu Zhang","doi":"10.1049/ipr2.70155","DOIUrl":"10.1049/ipr2.70155","url":null,"abstract":"<p>High-fidelity, speech-driven 3D facial animation is crucial for immersive applications and virtual avatars. Nevertheless, advancement is impeded by two principal challenges: (1) a lack of high-quality 3D data, and (2) inadequate modelling of the multi-scale characteristics of speech signals. In this paper, we present Speech2Face3D, a novel two-stage transfer-learning framework that pretrains on large-scale pseudo-3D facial data derived from 2D videos and subsequently finetunes on smaller yet high-fidelity 3D datasets. This design leverages the richness of easily accessible 2D resources while mitigating reconstruction noise through a simple temporal smoothing step. Our approach further introduces a Multi-Scale Hierarchical Audio Encoder to capture subtle phoneme transitions, mid-range prosody, and longer-range emotional cues. Extensive experiments on public 3D benchmarks demonstrate that our method achieves state-of-the-art performance on lip synchronization, expression fidelity, and temporal coherence metrics. Qualitative user evaluations validate these quantitative improvements. Speech2Face3D is a robust and scalable framework for utilizing extensive 2D data to generate precise and realistic 3D facial animations only based on speech.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70155","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144681083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pose-Guided Re-Identification of Amur Tigers Under Wild Environmental Constraints 野生环境约束下东北虎的姿态引导再识别
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-22 DOI: 10.1049/ipr2.70160
Tianyu Wang, Boxuan Ma, Xinrui Zhao, Chao Mou, Jiahua Fan
{"title":"Pose-Guided Re-Identification of Amur Tigers Under Wild Environmental Constraints","authors":"Tianyu Wang,&nbsp;Boxuan Ma,&nbsp;Xinrui Zhao,&nbsp;Chao Mou,&nbsp;Jiahua Fan","doi":"10.1049/ipr2.70160","DOIUrl":"10.1049/ipr2.70160","url":null,"abstract":"<p>The conservation of endangered species is contingent upon accurate and efficient wildlife monitoring, which is essential for informed decision-making and effective preservation strategies. With the global population of Amur tigers (Panthera tigris altaica) falling below 600, innovative conservation strategies are critically needed. Traditional monitoring methods have fallen short in accuracy and efficiency, leading to a shift towards leveraging big data and artificial intelligence for effective wildlife surveillance. Existing re-identification techniques struggle with natural habitat challenges like occlusions, changing poses, varying light, and limited data. To overcome these issues, we propose the pose-guided dual branch re-identification network (PDBRNet). Our approach integrates pose estimation to guide feature disentanglement and alignment, crucial for accurate re-identification, while an image preprocessing method considering illumination factors mitigates lighting variations' impact on accuracy. Through validation on the occluded and illumination-varying amur tiger (OIAT) dataset, PDBRNet demonstrates exceptional performance. Specifically, in single-camera scenarios, PDBRNet achieves an outstanding mean average precision (mAP) of 79.4, surpassing the performance of PGCFL (51.6) and PPGNet (69.7). Moreover, in cross-camera scenarios, PDBRNet maintains its superiority with a remarkable mAP of 54.0, along with Rank-1 and Rank-5 scores of 97.8 and 98.9, respectively, showcasing its robustness in real-world surveillance applications. The introduction of PDBRNet significantly enhances re-identification accuracy and holds promise for addressing complexities in field environments, contributing significantly to wildlife conservation efforts.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70160","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144672634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FUSION: Uncertainty-Guided Federated Semi-Supervised Learning for Medical Image Segmentation 融合:不确定性引导联邦半监督学习医学图像分割
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-22 DOI: 10.1049/ipr2.70147
Abdul Raheem, Zhen Yang, Haiyang Yu, Malik Abdul Manan, Fahad Sabah, Shahzad Ahmed
{"title":"FUSION: Uncertainty-Guided Federated Semi-Supervised Learning for Medical Image Segmentation","authors":"Abdul Raheem,&nbsp;Zhen Yang,&nbsp;Haiyang Yu,&nbsp;Malik Abdul Manan,&nbsp;Fahad Sabah,&nbsp;Shahzad Ahmed","doi":"10.1049/ipr2.70147","DOIUrl":"10.1049/ipr2.70147","url":null,"abstract":"<p>Federated learning (FL) for medical image segmentation poses critical challenges, including non-IID data distributions, limited access to labelled annotations, and stringent privacy constraints across institutions. To address these, we propose FUSION (Federated Unified Semi-Supervised Optimisation Network), a novel dual-path training framework that integrates both Federated Labelled Data Learning (FLDL) and Federated Unlabelled Data Training (FUDT). Central to FUSION is a two-stage pseudo-label refinement strategy designed to ensure robustness under real-world federated constraints. First, synthetic label denoising is performed using Monte Carlo dropout-based uncertainty estimation, enabling clients to identify and exclude low-confidence predictions. Second, prototype-based correction is applied to further refine pseudo-labels by aligning them with class-specific feature centroids, mitigating errors caused by domain shifts and inter-client variability. These refined labels are used for localised training on unlabelled clients, while a dynamic aggregation scheme modulated by a reliability-based hyperparameter μ adjusts the influence of labelled versus unlabelled clients during global model updates. This tightly coupled interaction between pseudo-label quality and federated optimisation ensures stability, accelerates convergence, and enhances generalisation across heterogeneous clients. FUSION is evaluated on three diverse datasets: TCGA-LGG (brain MRI), Kvasir-SEG (colonoscopy), and UDIAT (ultrasound) and consistently outperforms state-of-the-art FL models in Dice, IoU, HD95, and ASD metrics. Results confirm the critical role of synthetic label refinement in enhancing segmentation accuracy, boundary precision, and model scalability. FUSION provides a technically grounded, privacy-preserving, and label-efficient solution for real-world multi-institutional medical image segmentation tasks.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70147","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144672633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Review of Deep Learning-Based Medical Image Segmentation 基于深度学习的医学图像分割研究进展
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-21 DOI: 10.1049/ipr2.70163
Xinyue Zhang, Jianfeng Wang, Xiaochun Cheng, Junran Li
{"title":"A Review of Deep Learning-Based Medical Image Segmentation","authors":"Xinyue Zhang,&nbsp;Jianfeng Wang,&nbsp;Xiaochun Cheng,&nbsp;Junran Li","doi":"10.1049/ipr2.70163","DOIUrl":"10.1049/ipr2.70163","url":null,"abstract":"<p>Medical image segmentation, the process of precisely delineating regions of interest (e.g. organs, lesions, cells) within medical images, is a pivotal technique in medical image analysis. It finds widespread application in computer-aided diagnosis, surgical planning, radiation therapy, and pathological analysis, thus playing a crucial role in enabling precision medicine and enhancing the quality of clinical care. Traditional medical image segmentation methods often rely on hand-crafted features and rule-based approaches, which struggle to handle the inherent complexity and variability of medical imagery, leading to limitations in segmentation accuracy and robustness. Recently, deep learning methodologies, driven by their powerful capabilities in automatic feature learning and non-linear modelling, have overcome the limitations of traditional methods and achieved significant advancements in the field of medical image segmentation. This review provides a comprehensive overview and summary of recent progress in deep learning-based medical image segmentation, with a particular focus on fully supervised learning paradigms leveraging convolutional neural networks, transformers, and the segment anything model. We delve into the underlying principles, network architectures, advantages, and limitations of these approaches. Furthermore, we systematically compare their performance across diverse imaging modalities, anatomical structures, and pathological targets. We also present a curated compilation of commonly used datasets, evaluation metrics, and loss functions relevant to medical image segmentation. Finally, we discuss future research directions and potential challenges, offering insights into the evolving landscape of this critical field.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70163","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144666594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Logit Reconstruction in Knowledge Distillation 知识蒸馏中的自适应Logit重构
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-18 DOI: 10.1049/ipr2.70162
Han Chen, Cunkang Wu, Meng Han, Xuyang Teng
{"title":"Adaptive Logit Reconstruction in Knowledge Distillation","authors":"Han Chen,&nbsp;Cunkang Wu,&nbsp;Meng Han,&nbsp;Xuyang Teng","doi":"10.1049/ipr2.70162","DOIUrl":"10.1049/ipr2.70162","url":null,"abstract":"<p>In the logit-based knowledge distillation method, the student model learns the classification information of the teacher network by transmitting high-dimensional and abstract logits. Nevertheless, the teacher network is not an optimal learning target. On common datasets such as CIFAR100 and ImageNet, the majority of models exhibit classification accuracies of only 60% to 80%. These errors in the teacher models are a significant part of knowledge distillation that cannot be ignored. In order to facilitate the acquisition of more accurate knowledge by students, we propose the implementation of adaptive logit reconstruction knowledge distillation (ALRKD). ALRKD corrects errors by using the standard deviation, which represents the fluctuation degree of the logit distribution. Furthermore, in order to compensate for the loss of information that occurs during the correction process, an additional branch is designed to provide supplementary knowledge regarding the relationships between other classes. The results of several experiments on common datasets demonstrate the significant superiority of ALRKD.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70162","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144647424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Spatial and Global Correlation-Aware Network for Multiple Sclerosis Lesion Segmentation from Multi-Modal MR Images 多模态磁共振图像中多发性硬化症病灶分割的空间和全局关联感知网络
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-17 DOI: 10.1049/ipr2.70164
Zhanlan Chen, Xiuying Wang, Jing Huang, Jie Lu, Jiangbin Zheng
{"title":"A Spatial and Global Correlation-Aware Network for Multiple Sclerosis Lesion Segmentation from Multi-Modal MR Images","authors":"Zhanlan Chen,&nbsp;Xiuying Wang,&nbsp;Jing Huang,&nbsp;Jie Lu,&nbsp;Jiangbin Zheng","doi":"10.1049/ipr2.70164","DOIUrl":"10.1049/ipr2.70164","url":null,"abstract":"<p>Multiple sclerosis (MS) lesion segmentation from MR imaging is a prerequisite step in clinical diagnosis and treatment of brain diseases. However, automated segmentation of MS lesions remains a challenging task, owing to the variant morphology and uncertain distribution of lesions across subjects. Despite the achieved success by existing methods, two problems still persist in automated segmentation of MS lesions, namely the lack of an effective feature enhancement approach for capturing locality context and the lack of global coherence in prediction for pixels. Hence, we propose a correlation learning network for both local and global context in this work. Specifically, we propose a sparse spatial correlation module to learn the spatial correlations within neighbours for local context. Besides, we propose a global coherence module to encode long-range dependencies for global context. The proposed method is evaluated on a public ISBI2015 datatset and a private in-house dataset collected from hospital. Experimental results show the competitive performance of our method against state-of-the-art methods.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70164","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144647571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reversible Data Hiding via Bit-Plane Block Rearrangement and Intra-Block Compression Coding for Encrypted Images 基于位面块重排和块内压缩编码的加密图像可逆数据隐藏
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-16 DOI: 10.1049/ipr2.70158
Shuyi Deng, Nianqiao Li, Chunqiang Yu, Xianquan Zhang, Zhenjun Tang
{"title":"Reversible Data Hiding via Bit-Plane Block Rearrangement and Intra-Block Compression Coding for Encrypted Images","authors":"Shuyi Deng,&nbsp;Nianqiao Li,&nbsp;Chunqiang Yu,&nbsp;Xianquan Zhang,&nbsp;Zhenjun Tang","doi":"10.1049/ipr2.70158","DOIUrl":"10.1049/ipr2.70158","url":null,"abstract":"<p>Reversible data hiding in encrypted images (RDHEI) enables secret data embedding within encrypted images while allowing for the lossless recovery of the original image after data extraction. This technique holds significant applications in various domains such as cloud storage and data security. However, many existing RDHEI methods suffer from limited embedding capacity. To address this limitation, we present a novel and high capacity RDHEI algorithm via bit-plane block rearrangement and intra-block compression coding (hereafter BRBCC algorithm). First, the prediction error (PE) image is generated by using a median edge detection predictor, and the high-order zero-valued bit-planes are compressed. The non-zero-valued bit-planes are then separated into non-overlapping blocks that can be classified as all-zero blocks, embeddable blocks, or non-embeddable blocks. These blocks are then sorted and grouped in terms of block type for block coding. Finally, a new intra-block compression coding technique with small coded data for locating block elements is proposed to conduct effective compression and thereby reserve more space for embedding secret data. Experimental results indicate that the embedding rates of the BRBCC algorithm reach 3.9381 and 3.8436 bpp on the public datasets of BOSSbase and BOWS-2, respectively, outperforming some state-of-the-art RDHEI algorithms and exhibiting good application potential.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70158","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144635184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatiotemporal Context Adapting Framework for Visual Object Tracking 视觉目标跟踪的时空上下文自适应框架
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-16 DOI: 10.1049/ipr2.70150
Kunlong Zhao, Dawei Zhao, Xu Wang, Liang Xiao, Yulong Huang, Yiming Nie, Yonggang Zhang, Bin Dai
{"title":"Spatiotemporal Context Adapting Framework for Visual Object Tracking","authors":"Kunlong Zhao,&nbsp;Dawei Zhao,&nbsp;Xu Wang,&nbsp;Liang Xiao,&nbsp;Yulong Huang,&nbsp;Yiming Nie,&nbsp;Yonggang Zhang,&nbsp;Bin Dai","doi":"10.1049/ipr2.70150","DOIUrl":"10.1049/ipr2.70150","url":null,"abstract":"<p>Visual object tracking is widely applied in intelligent transportation systems and visual surveillance systems that serve smart cities, as well as in autonomous vehicles. Existing methods usually utilise a relation-modelling framework to model the visual object tracking problem, with auxiliary spatial context and temporal information. The spatial context is often extracted by enlarging the target template, which can introduce more background and positional information. The temporal correlation is obtained by associating the search image with previous images. However, due to noise interference, existing methods often partially exploit auxiliary data, leading to underutilisation of spatiotemporal information. To address these issues, we propose a novel and concise tracking framework, uniformly encoding all auxiliary data, including the enlarged target template, previous images, and corresponding target bounding boxes. Specifically, to mitigate the unstable factors introduced by these raw inputs, we propose a spatiotemporal context adaptive encoder, which can adaptively select appropriate information in noisy data. Extensive experiments show that the proposed method achieves state-of-the-art performance on various benchmarks, demonstrating its superiority.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70150","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144635092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CR-YOLOv8-Based Detection Method for Identifying Non-Functional Satellite Components 基于cr - yolov8的卫星非功能部件识别方法
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-16 DOI: 10.1049/ipr2.70153
He Bian, Derui Zhang, Cheng Li, Zhe Zhang, Wenjie Liu, Jianzhong Cao, Chao Mei, Gaopeng Zhang
{"title":"CR-YOLOv8-Based Detection Method for Identifying Non-Functional Satellite Components","authors":"He Bian,&nbsp;Derui Zhang,&nbsp;Cheng Li,&nbsp;Zhe Zhang,&nbsp;Wenjie Liu,&nbsp;Jianzhong Cao,&nbsp;Chao Mei,&nbsp;Gaopeng Zhang","doi":"10.1049/ipr2.70153","DOIUrl":"10.1049/ipr2.70153","url":null,"abstract":"<p>Detecting non-functional satellite components is critical for on-orbit servicing. Current detection methods struggle with complex image noise, motion blur in space environments, and the limited realism of artificially synthesised sample data. To address these challenges, we propose an enhanced you only look once version 8 (YOLOv8)-based method. In terms of network architecture, we introduce innovative designs for the backbone and neck components. A novel hybrid attention mechanism replaces the conventional approach, improving the perception and processing of intricate image features and significantly enhancing feature extraction. Additionally, we integrate modules inspired by residual networks into the neck structure, improving training adaptability and ensuring robust information transmission. This design highlights key target features while minimising feature attenuation. We also establish the satellite key element (SAKE) dataset under simulated real space conditions, including image noise and jitter blur. This dataset features components such as satellite bodies and solar panels and uses an encoder–decoder network architecture to refine context information. By merging this with a branch preserving high-resolution details, we enhance dataset expressiveness. Experiments demonstrate that the enhanced algorithm achieves a mean average precision (mAP) of 78.98% on the SAKE dataset, a 2.57% improvement over the original YOLOv8. The refined model effectively detects critical satellite components, showing superior performance in noisy and blurry scenarios.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70153","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144635090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MBSM-Net: A Multi-Branch Structure Model for Pneumoconiosis Screening and Grading of Chest X-Ray Images MBSM-Net:用于尘肺筛查和胸部x线图像分级的多分支结构模型
IF 2.2 4区 计算机科学
IET Image Processing Pub Date : 2025-07-16 DOI: 10.1049/ipr2.70128
Shuzhi Su, Yifan Wang, Yanmin Zhu, Yong Dai, Zekuan Yu, Zhi-Ri Tang, Bo Li, Shengzhi Wang
{"title":"MBSM-Net: A Multi-Branch Structure Model for Pneumoconiosis Screening and Grading of Chest X-Ray Images","authors":"Shuzhi Su,&nbsp;Yifan Wang,&nbsp;Yanmin Zhu,&nbsp;Yong Dai,&nbsp;Zekuan Yu,&nbsp;Zhi-Ri Tang,&nbsp;Bo Li,&nbsp;Shengzhi Wang","doi":"10.1049/ipr2.70128","DOIUrl":"10.1049/ipr2.70128","url":null,"abstract":"<p>Convolutional neural network (CNN)-based auxiliary diagnostic systems have been widely proposed. However, CNNs have limitations in perceiving global features and more subtle features, which makes existing methods unable to achieve ideal accuracy in tasks such as pneumoconiosis screening. To overcome these limitations, we propose MBSM-Net, a new multi-branch structure-enhanced model for pneumoconiosis screening and grading based on X-ray images. MBSM-Net introduces an adaptive feature selection and fusion module to achieve synchronous extraction and hierarchical fusion of global and local features. In the local feature extraction module, we designed a CNN-Mamba module. This module integrates prior information through a detailed enhancement module to compensate for the shortcomings of traditional convolutions and significantly enhances the expression of subtle lesion information. Meanwhile, the Mamba module simulates pixel-level long-range dependencies to extract finer-grained texture features. In the global feature extraction module, we cleverly utilize the windowed multi-head self-attention (W-MSA) mechanism, enabling the model to better understand the overall distribution and degree of fibrosis of pulmonary lesions. We validated the MBSM-Net model on 1,760 real anonymized patient X-ray chest films. The results showed that the accuracy of the MBSM-Net model reached 78.6%, and the <i>F</i>1 score reached 79%, both of which are superior to existing models.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70128","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144635224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信