{"title":"A Hybrid Medical Image Semantic Segmentation Network Based on Novel Mamba and Transformer","authors":"Jianting Shi, Huanhuan Liu, Zhijun Li","doi":"10.1049/ipr2.70205","DOIUrl":"10.1049/ipr2.70205","url":null,"abstract":"<p>Recently, deep learning has greatly advanced medical image segmentation. Convolutional neural networks (CNNs) excel in capturing local image features, whereas ViT adeptly models long-range dependencies through multi-head self-attention mechanisms. Despite their strengths, both CNN and ViT face challenges in efficiently processing long-range dependencies in medical images and often require substantial computational resources. To address this, we propose a novel hybrid model combining Mamba and Transformer architectures. Our model integrates ViT's self-attention modules within a pure-vision Mamba U-shaped encoder, capturing both global and local information through nested Transformer and Mamba modules. Additionally, a multi-scale feed-forward neural network is incorporated within the Mamba blocks to enhance feature diversity by capturing fine-grained local details. Finally, a channel-adaptive feature (CAF) fusion module is introduced at the original skip connections to mitigate feature loss during information fusion and to improve segmentation accuracy in boundary regions. Quantitative and qualitative experiments were conducted on two public datasets: breast ultrasound image (BUSI) and clinic. The Dice score, Intersection over Union (IoU) score, recall score, <i>F</i>1 score and 95% Hausdorff distance (HD95) of the proposed model on the BUSI dataset were 0.7918, 0.7016, 0.8508, 0.7919 and 12.04 mm, respectively. On ClinicDB, these metrics reach 0.9239, 0.8671, 0.9278, 0.9239 and 5.49 mm, respectively. The proposed model outperforms existing state-of-the-art CNN-, Transformer- and Mamba-based methods in segmentation accuracy, according to experimental data.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70205","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145038061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Systematic Review of U-Net Optimizations: Advancing Tumour Segmentation in Medical Imaging","authors":"Omar Abueed, Yong Wang, Mohammad Khasawneh","doi":"10.1049/ipr2.70203","DOIUrl":"10.1049/ipr2.70203","url":null,"abstract":"<p>Since its inception in 2015, U-Net has emerged as a cornerstone architecture that is particularly well-designed for medical image segmentation. Despite its robustness, precise tumour segmentation persists as a challenge because of tumour heterogeneity, boundary ambiguity, and the partial volume effects exhibited by tumours. Therefore, the U-Net architecture has been altered many times to expand its capabilities with complex segmentation challenges, particularly with tumours. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology, this systematic review critically evaluates and analyses the effectiveness of recent enhancement strategies developed to optimize the performance of the traditional U-Net architecture in attaining accurate tumour segmentation in CT and MRI images. The strategies have been divided into five main areas: U-Net architectural enhancements, including U-Net backbone optimization; skip connection refinements; bottleneck optimizations; transformer-based integrations; and metaheuristic algorithms as a self-adaptive optimization technique. Afterward, each category is thoroughly examined to determine how the strategies address specific limitations inherent to the traditional U-Net model. In addition, this paper reviews the pivotal role of preprocessing techniques in determining segmentation performance. This review identifies persistent research gaps and offers valuable insights for future research to improve the robustness, accuracy, and clinical applicability of the U-Net model.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70203","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chang'an Zhang, Yian Wang, Ke Xu, ChunHong Yuan, Fusen Guo
{"title":"SPWS-Transformer: A Study of 3D Target Detection Method Based on Lightweight Depth Prediction With Multi-Scale Fusion","authors":"Chang'an Zhang, Yian Wang, Ke Xu, ChunHong Yuan, Fusen Guo","doi":"10.1049/ipr2.70204","DOIUrl":"10.1049/ipr2.70204","url":null,"abstract":"<p>Advanced driver assistance systems (ADAS) mainly consist of three components: environmental perception, decision planning, and motion control. As a fundamental component of the ADAS environmental perception system, 3D object detection enables vehicles to avoid obstacles and ensure driving safety only through accurate and real-time prediction and localization of three-dimensional targets such as vehicles and pedestrians in road scenes. Therefore, to improve both the real-time performance and accuracy of 3D object detection, we propose a lightweight depth prediction-based 3D object detection model with multi-scale fusion—SPWS-Transformer. First, to enhance the model's accuracy, we propose a feature extraction network incorporating multi-scale feature fusion and depth prediction. By designing a multi-scale feature fusion module, we effectively combine multi-scale semantic and fine-grained information from feature maps of different scales to enhance the network's feature extraction capability. To capture spatial information from the feature maps, we apply convolution, group normalization, and nonlinear activation operations on the fused feature maps to generate depth feature maps. Both the fused feature maps and depth feature maps serve as inputs for subsequent network stages. To further improve accuracy, we leverage the long-range modelling advantages of Transformers by designing a feature enhancement encoder to strengthen the representation capability of depth feature maps. We incorporate a dilated encoder to perform positional encoding on depth feature maps and utilize multi-head self-attention mechanisms to capture contextual relationships within the input scene, thereby enhancing the detection capability of the 3D object detection network. Then, to improve real-time performance, we design a decoder structure with scale-aware attention. By predefining masks of different scales, we adaptively learn a scale-aware filter using depth and visual features to enhance object queries. Finally, on the KITTI dataset, the improved algorithm achieves an AP of 24.66% for the car category, with more significant improvements in detection accuracy under the ‘hard’ difficulty level. The model achieves an inference time of 24 ms.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70204","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HoneyFL: Using Honeypots to Catch Backdoors in Federated Learning","authors":"Haibin Zheng, Wenjie Shen, Jinyin Chen","doi":"10.1049/ipr2.70201","DOIUrl":"10.1049/ipr2.70201","url":null,"abstract":"<p>Federated learning (FL) has been revealed as vulnerable to backdoor attacks since the server cannot directly access the locally collected data of clients, even if they are malicious. Many efforts either try to validate the global model with trusted clients, or try to make it difficult or costly to upload malicious updates. Unfortunately, the existing solutions are still challenged in defending against stealthy backdoor attacks or negative impacts brought to the aggregation. Especially in the non-independent and identically distributed setting. Moreover, these methods overlook the threat of adaptive attacks, that is, attackers fully know the defense implementation. To address these issues, we propose a novel run-time defense against diverse backdoor attacks, dubbed <i>HoneyFL</i>. It differs from previous works in three key aspects: (1) <i>effectiveness</i> - it is capable of defending against stealthy backdoors through leveraging honeypot clients; (2) <i>aggregation</i> - it promises effective aggregation since only a limited number of honeypot clients are used; (3) <i>robustness</i> - it can handle adaptive backdoor attacks based on differential prediction. Compared with five state-of-the-art defense baselines, extensive experiments show that HoneyFL produces a higher backdoor detection success rate above 97% and a lower false positive rate below 3%, where seven attacks generate backdoor examples. Its impact on the aggregation results of the main task is negligible. We also show that the defense success rate of HoneyFL against adaptive attacks is approximately <span></span><math>\u0000 <semantics>\u0000 <mo>∼</mo>\u0000 <annotation>$sim$</annotation>\u0000 </semantics></math>3.52 of the baselines on average.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70201","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145012682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Subhash Chandra Pal, Chirag Kamal Ahuja, Dimitrios Toumpanakis, Johan Wikstrom, Robin Strand, Ashis Kumar Dhara
{"title":"Computer-Aided Volumetric Quantification of Pre- and Post-Treatment Intracranial Aneurysms in MRA","authors":"Subhash Chandra Pal, Chirag Kamal Ahuja, Dimitrios Toumpanakis, Johan Wikstrom, Robin Strand, Ashis Kumar Dhara","doi":"10.1049/ipr2.70199","DOIUrl":"10.1049/ipr2.70199","url":null,"abstract":"<p>Intracranial aneurysm, a cerebrovascular condition involving abnormal arterial dilation, poses a high risk of subarachnoid hemorrhage upon rupture. Accurate quantification is crucial for diagnosis and follow-up treatment. This paper introduces a novel multi-scale dual-attention network (MSDA-Net) for quantification of intracranial aneurysms in MRA images. The proposed framework includes a context aware patch (CAP) module, multi-scale convolutional blocks, and a dual-attention block, where the CAP module extracts center-line patches to address foreground-background imbalance, the multi-scale and dual-attention blocks enable feature extraction of anatomical dependencies for fine-grained segmentation. The framework leverages three morphological features such as locations of aneurysms, vascular bifurcations, and vessel topology using a multi-task learning scheme for better segmentation. MSDA-Net surpasses state-of-the-art models such as U-Net, residual U-Net, attention U-Net, and nnU-net with an improved dice similarity coefficient of 0.71 and a volume similarity of 0.85. Experiments conducted on the publicly available ADAM challenge dataset and a private post-treatment database demonstrate the reliability and performance of this approach. The method could be used in clinical decision-making in aneurysm follow-up and has profound potential for integration into clinical workflows.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70199","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145011956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SF-Net: Video Frame Interpolation With a 3D Square Funnel Network","authors":"Hamid Azadegan, Ali-Asghar Beheshti Shirazi","doi":"10.1049/ipr2.70193","DOIUrl":"10.1049/ipr2.70193","url":null,"abstract":"<p>Video frame interpolation (VFI) is a problem of designing in-between frames from both the previous and the subsequent frames for enhancing the quality of video. The majority of traditional methods, particularly U-Net-based approaches, suffer from high computational complexity and memory usage in terms of high numbers of parameters. We propose the square funnel network (SF-Net), a novel network structure with significantly fewer parameters but comparable performance, in this paper. SF-Net follows a unique configuration that increases the third dimension of the input frames in deeper layers instead of increasing the number of filters, which results in a more efficient and more compact model. Our model makes use of a maximum of 64 filters in nearly all of its layers, except for the last two layers, which employ 128 filters each. With both objective and subjective evaluation, SF-Net has outstanding visual quality and efficiency, which makes it suitable for low-computational-resource applications. The paper depicts a good direction of VFI, which is to decrease the number of parameters without sacrificing performance.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70193","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145021896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SLSDNet: A Real-Time Stop-Line Detection Network Integrating the Line Segment Detection Method","authors":"Chengkang Liu, Yafei Liu, Ding Hu, Xiaoguo Zhang","doi":"10.1049/ipr2.70194","DOIUrl":"10.1049/ipr2.70194","url":null,"abstract":"<p>Stop-line detection aids autonomous driving systems in accurately determining vehicle position and driving status. Existing methods typically rely on bounding boxes, failing to capture stop-line shape. Complex road conditions, such as deteriorated markings or intense lighting, challenge these methods’ detection robustness. To address the above problems, we propose a novel representation approach for stop-lines that balances high precision with real-time detection requirements. Inspired by the prevalence of line features in road markings, we propose SLSDNet—a stop-line segment detection network that fuses image data with line features to prioritise line-rich regions for detection. Furthermore, we employ a multi-task learning scheme to extract stop-line features across multiple dimensions and incorporate a verification mechanism to ensure robust performance. In addition, to address the lack of a stop-line dataset, we collected images from multiple sources and published our stop-line dataset at https://github.com/ChengkangLiu/Stop-Line-Dataset. Experimental results demonstrate that our method achieves the best F1-score (97.02) and PR-AUC (0.9684), outperforming state-of-the-art methods. In terms of efficiency, our method achieves real-time operation speed at 109 FPS with 4.41 M parameters, capable of running on devices with limited computing resources.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70194","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144935081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Safer Roads: A Deep Learning and Fuzzy Logic-Based Driver Fatigue Detection System","authors":"Marios Akrivopoulos, Socratis Gkelios, Angelos Amanatiadis, Yiannis Boutalis, Savvas Chatzichristofis","doi":"10.1049/ipr2.70202","DOIUrl":"10.1049/ipr2.70202","url":null,"abstract":"<p>This paper presents a real-time, vision-based framework for detecting driver fatigue using a single low-cost, road-facing camera, eschewing direct visual monitoring of the driver. Unlike conventional systems that rely on in-cabin facial or physiological analysis, the proposed architecture prioritizes privacy by inferring fatigue through vehicle dynamics and road interaction alone. Built upon the YOLOP deep learning model, the system performs lane segmentation and object detection to extract two critical indicators: lane deviation and inter-vehicle distance, both computed from monocular vision. These signals are interpreted via a fuzzy logic module that incorporates trapezoidal, triangular, and Gaussian membership functions, enabling context-sensitive and explainable fatigue assessment. Comparative evaluation of these functions illustrates trade-offs in responsiveness and generalization. Initial validation against expert human assessments shows promising alignment in perceived fatigue levels, suggesting the system can meaningfully approximate fatigue-related judgments. By aligning with emerging ethical frameworks for non-intrusive AI in mobility, the system marks a step toward socially responsible and practically deployable fatigue monitoring in intelligent transportation.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70202","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144929899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mustafa Kamil Khairullah, Mohd Zafri Bin Baharuddin, Reema Thabit, Mohammad Ahmed Alomari, Gamal Alkawsi, Faten A. Saif
{"title":"A DNA-Dynamic Permutation-Diffusion Algorithm for Image Encryption Using Scaling Chaotification Models and Advanced DNA Operations","authors":"Mustafa Kamil Khairullah, Mohd Zafri Bin Baharuddin, Reema Thabit, Mohammad Ahmed Alomari, Gamal Alkawsi, Faten A. Saif","doi":"10.1049/ipr2.70181","DOIUrl":"10.1049/ipr2.70181","url":null,"abstract":"<p>The rise in cyber threats to digital images over networks is a primary problem for both private and government organisations. Image encryption is considered a useful way to secure the digital image; however, it faces critical challenges such as weak key generation, chosen-plaintext attacks, high overhead, and scalability. To overcome these challenges, this paper proposes the DNA-Dynamic Concurrent Permutation-Diffusion Algorithm (DNA-DCP-DA), which introduces four advanced encryption mechanisms. Firstly, new scaling chaotification models are introduced to enhance chaotic properties, achieving superior results in bifurcation, Lyapunov Exponent (LE), Sample Entropy (SEn), Kolmogorov Entropy (KEn) and key generation. Secondly, a Key Vectorisation Method (KVM) is proposed to optimise execution time and reduce the computational overhead of chaotic map iterations. Thirdly, robust non-commutative DNA operations are introduced, including DNA hybrid and circular shift operations to enhance encryption security. Finally, integrate permutation and dynamic diffusion processes, strengthening security and improving efficiency. To evaluate the proposed algorithm, extensive experiments have been conducted, and results have been compared with the latest encryption algorithms. This shows the proposed encryption algorithm is better, with superior results for correlation results close to zero and Information Entropy (IE) larger than 7.999. The Number of Pixel Change Rates (NPCR) exceeds 99.6%, and the Uniform Average Change Intensity (UACI) is above 33.4%. The algorithm encrypts an image of size 256 × 256 in 0.1255 s, with a key space reaching 2<sup>697</sup>. As a result, the proposed system establishes a new benchmark for secure and efficient image encryption against cyber threats.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70181","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144929901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bruno Berenguel-Baeta, Jesus Bermudez-Cameo, Jose J. Guerrero
{"title":"Panoramic Depth and Semantic Estimation With Frequency and Distortion Aware Convolutions","authors":"Bruno Berenguel-Baeta, Jesus Bermudez-Cameo, Jose J. Guerrero","doi":"10.1049/ipr2.70197","DOIUrl":"10.1049/ipr2.70197","url":null,"abstract":"<p>Omnidirectional images reveal advantages when addressing the understanding of the environment due to the 360-degree contextual information. However, the inherent characteristics of the omnidirectional images add additional problems to obtain an accurate detection and segmentation of objects or a good depth estimation. To overcome these problems, we exploit convolutions in the frequency domain, obtaining a wider receptive field in each convolutional layer, and convolutions in the equirectangular projection, to cope with the image distortion. Both convolutions allow to leverage the whole context information from omnidirectional images. Our experiments show that our proposal has better performance on non-gravity-oriented panoramas than state-of-the-art methods and similar performance on oriented panoramas as specific state-of-the-art methods for semantic segmentation and for monocular depth estimation, outperforming the sole other method which provides both tasks.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70197","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144927302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}