IET Image Processing最新文献_第4页

Optimized Gene and Image-Based Feature Selection Using Modified Genetic Algorithms and Deep Learning for Predictive Skin Cancer Detection 基于改进遗传算法和深度学习的优化基因和图像特征选择用于预测皮肤癌检测

IF 2.2 4区计算机科学

IET Image Processing Pub Date : 2025-09-01 DOI: 10.1049/ipr2.70186

Saadya Fahad Jabbar, M. A. Balafar, Ali Jameel Hashim

{"title":"Optimized Gene and Image-Based Feature Selection Using Modified Genetic Algorithms and Deep Learning for Predictive Skin Cancer Detection","authors":"Saadya Fahad Jabbar, M. A. Balafar, Ali Jameel Hashim","doi":"10.1049/ipr2.70186","DOIUrl":"10.1049/ipr2.70186","url":null,"abstract":"Gene selection is critical for cancer diagnosis because the ability to discover specific biomarkers has a major impact on diagnostic accuracy. Traditional approaches frequently struggle with high-dimensional genomic data, where irrelevant or redundant characteristics might impair machine learning algorithms. Despite advances in computational approaches, there is a gap in the optimization of deep learning models for gene selection, particularly in terms of selecting the best model architecture and hyperparameters. This paper addresses three critical challenges in genomic biomarker discovery for cancer diagnosis: (1) the high-dimensional nature of gene expression data, (2) the need for biologically interpretable feature selection, and (3) the optimization of deep learning architectures for genomic analysis. We present a novel hybrid approach combining modified genetic algorithms with deep neural networks to overcome limitations of traditional methods in handling feature redundancy and computational complexity. Our methodology introduces three key innovations: a dynamic mutation operator that adapts to population diversity, multi-objective optimization balancing classification accuracy with biological pathway relevance, and simultaneous co-evolution of both gene subsets and neural network architectures. The proposed system achieves state-of-the-art performance, with 99.1% accuracy, 98.9% AUC-ROC, and 99.0% F1-score on the ISIC 2020 dataset, while maintaining clinically relevant sensitivity (98.0%) and specificity (98.5%). Extensive validation across six benchmark datasets demonstrates consistent improvements over existing machine learning and deep learning techniques, particularly in handling rare cancer subtypes and low-resolution images. Future research directions include: (1) integration of multi-modal clinical data to enhance rare subtype detection, (2) development of federated learning frameworks for privacy-preserving distributed analysis, and (3) creation of explainability tools to bridge the gap between computational feature selection and clinical interpretation. The results establish our evolutionary optimization approach as both a high-performance diagnostic tool and a flexible framework for advancing precision oncology research.","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70186","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144923592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Visual Perception-Based Quality Assessment for Stitched Panoramic Images 基于视觉感知的拼接全景图像质量评价

IF 2.2 4区计算机科学

IET Image Processing Pub Date : 2025-09-01 DOI: 10.1049/ipr2.70198

Yu Fan, Chunyi Chen, Xiaojuan Hu, Yanfeng Li, Haiyang Yu, Ripei Zhang, Luguang Han

{"title":"Visual Perception-Based Quality Assessment for Stitched Panoramic Images","authors":"Yu Fan, Chunyi Chen, Xiaojuan Hu, Yanfeng Li, Haiyang Yu, Ripei Zhang, Luguang Han","doi":"10.1049/ipr2.70198","DOIUrl":"10.1049/ipr2.70198","url":null,"abstract":"Panoramic image stitching algorithms inevitably introduce distortions due to their inherent limitations. Due to the characteristics of the Human Visual System (HVS), slight stitching distortions have a minimal impact on perceived image quality, whereas severe distortions significantly degrade it. To address this issue, a visual perception-based quality assessment method for stitched panoramic images is proposed. Existing no-reference image quality assessment (NR-IQA) methods often struggle to accurately model the HVS and its subjective perception of stitching distortions, leading to significant discrepancies between predicted and actual quality scores. Therefore, we use the learnt perceptual image patch similarity (LPIPS) to capture perceptual similarity from the HVS, thereby generating a high-quality pseudo-reference image (PRI). To further improve its perceptual quality, the PRI is refined using a visual restoration network (VRN) to reduce residual distortions. Subsequently, both the pseudo-reference and distorted images are fed into a quality assessment network with shared parameters to extract deep feature representations. To improve distortion-aware quality evaluation, we design an Adaptive Feature Fusion (AFF) module, where features from the pseudo-reference and distorted images are first enhanced through an attention mechanism and then adaptively fused via learnt weights for more effective quality assessment. Finally, the fused features are processed through fully connected layers for regression, predicting the perceptual quality score of the stitched panoramic image. Extensive experimental results demonstrate that the proposed method outperforms state-of-the-art approaches, achieving significant improvements across multiple evaluation metrics.","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70198","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144923629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DyLoFViT: A Novel Approach for Real-Time Metal 3D Printing Surface Quality Classification DyLoFViT：一种实时金属3D打印表面质量分类新方法

IF 2.2 4区计算机科学

IET Image Processing Pub Date : 2025-09-01 DOI: 10.1049/ipr2.70182

Yuqin Zeng, Lianli Liu, Ze Wen, Jiquan Liu, Shuqian Fan

{"title":"DyLoFViT: A Novel Approach for Real-Time Metal 3D Printing Surface Quality Classification","authors":"Yuqin Zeng, Lianli Liu, Ze Wen, Jiquan Liu, Shuqian Fan","doi":"10.1049/ipr2.70182","DOIUrl":"10.1049/ipr2.70182","url":null,"abstract":"Real-time surface quality monitoring of selective laser melting (SLM) is critical for defect identification and process stability but is still plagued by the trade-off between high precision and efficiency. We propose DyLoFViT, a light-weight vision transformer specifically optimized for industrial SLM defect classification. DyLoFViT makes four significant architectural contributions: (1) dynamic positional encoding that retains spatial coherence and facilitates translation invariance, (2) dynamic sparse gating that adaptively filters the most informative tokens, (3) Nyström-based sparse self-attention that estimates long-range dependencies with lower computation, and (4) frequency-domain compression with discrete cosine transform that filters high-frequency noise with preservation of low-frequency structure vital for defect classification. Experiments on an industrial test set of 1320 high-resolution images of four melt quality classes show that DyLoFViT achieves a Top-1 accuracy of 96.1% and an F1-score of 95.6% with merely 13.2M parameters and 2.9 GFLOPs. Compared with the prior CNN and transformer architectures, DyLoFViT achieves superior precision with an order of magnitude lower inference latency (0.3 ms/image), making it extremely well-suited for real-time, edge-device deployment in additive manufacturing. DyLoFViT hence sets a new standard for efficient, accurate in situ metal 3D printing defect classification.","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70182","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144923762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PAIR: Privacy Attack Against Image Recognition With Sample Augmentation PAIR：基于样本增强的图像识别隐私攻击

IF 2.2 4区计算机科学

IET Image Processing Pub Date : 2025-09-01 DOI: 10.1049/ipr2.70190

Yanru Feng, Qingjie Liu

{"title":"PAIR: Privacy Attack Against Image Recognition With Sample Augmentation","authors":"Yanru Feng, Qingjie Liu","doi":"10.1049/ipr2.70190","DOIUrl":"10.1049/ipr2.70190","url":null,"abstract":"With the prevalence of image recognition applications, machine learning (ML) poses a severe privacy risk. Its private data can be stolen through a variety of privacy attacks. However, most of them typically need detailed prior knowledge of the intended ML model, such as its structure and parameters, which is unrealistic for usage in real-world scenarios. To address this issue, a novel privacy attack approach against image recognition (PAIR) is proposed. Instead of involving numerous training examples, PAIR only considers small samples and utilizes a sample augmentation technique to create new instances with balanced attributes. Meanwhile, it also uses an upsampling interpolation method to resolve the high distortion issue of recovery images. In contrast, PAIR reduces reliance on the target model's architecture while still requiring certain prior knowledge, such as the model's input features and attack objectives. PAIR's performance is evaluated across a variety of settings and datasets. The experiments demonstrate that PAIR has performance benefits in image recovery and significantly outperforms representative privacy attacks. PAIR retains the attack effectiveness while reducing prior knowledge requirements, thereby enhancing the flexibility and applicability of the attack. In addition, dropout-based and noise-based defenses against privacy attacks are presented, and their performance is assessed.","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70190","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144923761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data-Informed SSD Object Detection for Real-Time Industrial Applications 数据知情的SSD对象检测实时工业应用

IF 2.2 4区计算机科学

IET Image Processing Pub Date : 2025-08-29 DOI: 10.1049/ipr2.70195

Quoc Chi Nguyen, Quoc Vinh Ngo, Phuong-Tung Pham, Thanh Huy Phung

{"title":"Data-Informed SSD Object Detection for Real-Time Industrial Applications","authors":"Quoc Chi Nguyen, Quoc Vinh Ngo, Phuong-Tung Pham, Thanh Huy Phung","doi":"10.1049/ipr2.70195","DOIUrl":"10.1049/ipr2.70195","url":null,"abstract":"This study investigates dataset-adaptive optimisation strategies for a Single Shot MultiBox Detector (SSD) network, focusing on a critical case study of parts detection on a manufacturing line. We address the challenges of achieving high-performance object detection in real-time industrial environments by implementing a structural optimisation approach that dynamically reduces feature maps and prior boxes based on dataset characteristics. The network self-identifies and prunes redundant features by analysing the statistical properties of the images and objects in the datasets, leading to a leaner and more efficient SSD network. This dataset-informed reduction of feature maps and anchor boxes accelerates training and inference speeds and enhances detection precision, which is crucial for accurate part identification in high-throughput production. Experimental results on a custom manufacturing line parts dataset, consisting of multi-coloured and differently sized bolts and nuts, demonstrate a mean Average Precision (mAP@50) of approximately 96%-99%, while the inference time was reduced by more than four times (to approximately 3–8 ms in the study case) according to feature extraction backbones, compared to the original network.","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70195","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144915237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Low-Rank Kernel Dictionary Learning for Enhancing Noisy Images 鲁棒低秩核字典学习增强噪声图像

IF 2.2 4区计算机科学

IET Image Processing Pub Date : 2025-08-29 DOI: 10.1049/ipr2.70196

Tingquan Deng, Jiaxuan Hui, Qingwei Jia, Jingyu Wang

{"title":"Robust Low-Rank Kernel Dictionary Learning for Enhancing Noisy Images","authors":"Tingquan Deng, Jiaxuan Hui, Qingwei Jia, Jingyu Wang","doi":"10.1049/ipr2.70196","DOIUrl":"10.1049/ipr2.70196","url":null,"abstract":"Denoising is a key preprocessing to enhance contaminated images. There have been lots of literature tackling such an issue. Most of them focus on pixel-level processing and ignore intrinsic sparsity and low-rank properties of noise and objects, respectively, in images. To argue this issue, a novel image enhancement model is proposed to eliminate or suppress noise, which integrates robust principal component analysis with kernel dictionary learning (KDL). Specifically, the Kernel trick is introduced to nonlinearly map image data and low-rank dictionary to be learned to high-dimensional spaces so as to separate sparsity noise from contaminated images. The proposed model is abbreviate as RSKDL. In RSKDL, the intrinsic structural characteristic of images is unclosed by adaptively learning the affinity graph of image data so as to ensure the enhanced images inheriting the manifold structure of original images. Meantime, the non-convex sparsity <math>\u0000 <semantics>\u0000 <msub>\u0000 <mi>ℓ</mi>\u0000 <mi>p</mi>\u0000 </msub>\u0000 <annotation>$ell _p$</annotation>\u0000 </semantics></math> regularization on the residual between original images and enhanced ones is imposed to exclude noise from original data. Extensive experiments on several image datasets show that the proposed model outperforms existing methods for image denoising.","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70196","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144915238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MemAttn-CL: Unified Memory, Attention, and Contrastive Learning for Enhanced Text-to-Image Generation MemAttn-CL：统一记忆，注意和对比学习增强文本到图像的生成

IF 2.2 4区计算机科学

IET Image Processing Pub Date : 2025-08-28 DOI: 10.1049/ipr2.70185

Md. Ahsan Habib, Md. Anwar Hussen Wadud, Mohammad Motiur Rahman, M. F. Mridha

{"title":"MemAttn-CL: Unified Memory, Attention, and Contrastive Learning for Enhanced Text-to-Image Generation","authors":"Md. Ahsan Habib, Md. Anwar Hussen Wadud, Mohammad Motiur Rahman, M. F. Mridha","doi":"10.1049/ipr2.70185","DOIUrl":"10.1049/ipr2.70185","url":null,"abstract":"Generating photo-realistic images from natural language descriptions is a challenging task at the intersection of natural language processing and computer vision. Text-to-image synthesis involves generating visual images in a way which naturally matches the semantic meaning of the input text. Recent diffusion-based models have demonstrated strong performance in image fidelity but are slow in inference and exhibit coarse semantic alignment. To overcome the two problems above and allow images to be more faithful (realistic) to texts and semantics in the wild, we propose a novel hybrid architecture called DM-GAN+ATT+CL (dynamic memory GAN + contrastive learning and attention mechanisms). Our method proceeds in a two-step manner: we first produce low-resolution images based on the DM-GAN model with dual attention modules and then refine the results through a memory-based feature refinement mechanism. Contrastive learning was then utilized on a separate dataset with high resolution image-text pairs to enhance feature discrimination and strengthen semantic consistency. The result is richer semantic relevance, stronger image variation and better visual quality. Extensive experimental results across multiple benchmark datasets—CUB, Oxford-102, MS-COCO, and MM-CelebA-HQ—demonstrate that the proposed DM-GAN+ATT+CL framework consistently outperforms state-of-the-art baselines. Notably, it achieved an R-precision of 95.24, an inception score (IS) of 38.43, and a Fréchet inception distance (FID) of 11.30 on the MS-COCO dataset, with similarly strong and consistent performance observed across the other datasets. These findings indicate that our approach substantially enriches the diversity and reality of synthetic images, promising a better future for text-image matching.","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70185","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallel Concatenated Feature Pyramid Network for Dehazing a Single Image on Smartphone Images 并行连接特征金字塔网络在智能手机图像上对单个图像进行去雾处理

IF 2.2 4区计算机科学

IET Image Processing Pub Date : 2025-08-28 DOI: 10.1049/ipr2.70187

Yu-Shiuan Tsai, Yi-Zeng Hsieh, Kai-En Lin, Pin-Hsiang Wang

{"title":"Parallel Concatenated Feature Pyramid Network for Dehazing a Single Image on Smartphone Images","authors":"Yu-Shiuan Tsai, Yi-Zeng Hsieh, Kai-En Lin, Pin-Hsiang Wang","doi":"10.1049/ipr2.70187","DOIUrl":"10.1049/ipr2.70187","url":null,"abstract":"Smartphones capturing images in outdoor environments are often affected by adverse weather conditions, resulting in low-quality images. This paper introduces the Parallel Concatenated Feature Pyramid Network (C-FPN) to address the challenge of dehazing single smartphone images. Dehazing a single image on smartphones is considered an ill-posed problem. While the Feature Pyramid Network (FPN) is widely used in computer vision tasks, its feature extraction is limited by the max-pooling operator. Furthermore, it cannot retain the hazy feature and restore the image at the same time, which fails to preserve critical hazy image features. Additionally, most existing methods struggle to balance preserving haze-relevant information with effective image restoration. To address these limitations, this study proposes a novel parallel concatenated FP architecture that estimates atmosphere light and calculates transmission information on smartphones. The key contributions of this paper include (1) designing a parallel concatenated FP architecture capable of retrieving hazy features across various environments in deeper layers, (2) incorporating a concatenation structure to retain hazy information, enabling depth estimation and the generation of a transmission map, (3) using the transmission map as an input for a convolutional neural network with a dehazing loss function to calculate atmosphere light under different environments, and (4) implementing a skipping connection in the C-FPN to retain essential features, facilitating an end-to-end learning structure. The proposed method demonstrates superior performance on the SOTS, NH-HAZE 2, and synthetic hazy image indoor datasets. The PSNR/SSIM achieve 26.58/0.948, 26.28/0.966 and 17.15/0.761, respectively. In addition to dehazing, the method achieves excellent object detection performance.","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70187","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HFGR: Hierarchical Fine-Grained Regression for Visibility Estimation 层次细粒度回归的可视性估计

IF 2.2 4区计算机科学

IET Image Processing Pub Date : 2025-08-28 DOI: 10.1049/ipr2.70191

Zhilong Xu, Yaping Huang, Fei Li, Rongmei Guo, Di Zhang, Wenqing Li, Jianyong Guo

{"title":"HFGR: Hierarchical Fine-Grained Regression for Visibility Estimation","authors":"Zhilong Xu, Yaping Huang, Fei Li, Rongmei Guo, Di Zhang, Wenqing Li, Jianyong Guo","doi":"10.1049/ipr2.70191","DOIUrl":"10.1049/ipr2.70191","url":null,"abstract":"Accurate visibility estimation is crucial in enormous real-world applications like transportation and autonomous driving. Recent image-based solution has attracted more interests due to the low cost and promising performance. However, existing methods typically regards the visibility estimation as a simple regression problem, which often fails to capture the subtle variations of different visibility levels. To tackle the problem, in this paper, we propose a novel hierarchical fine-grained regression (HFGR) model, which employs a multi-level structure that categorizes visibility into coarse-to-fine ranges and refines these estimated values with a fine-grained regression. Specifically, our designed hierarchical model perceives a global continuity weights, swiftly narrowing down the prediction range, while the small interval regression can also capture local features, achieving a satisfying fine-grained perception. By the proposed hierarchical regression, we can differentiate the subtle varieties between different visibility levels. The proposed HFGR model achieves the significant performance improvements in different fog conditions on our collected real-world highway foggy image dataset, offering a reliable solution for real-world applications.","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70191","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FCUnet: An Underwater Image Enhancement Hybrid Network via Fused Feature-Guided Cross-Attention 基于融合特征引导交叉注意的水下图像增强混合网络

IF 2.2 4区计算机科学

IET Image Processing Pub Date : 2025-08-28 DOI: 10.1049/ipr2.70184

Jie Zhu, Huibin Wang, Zhe Chen, Lili Zhang, Chunyan Ma

{"title":"FCUnet: An Underwater Image Enhancement Hybrid Network via Fused Feature-Guided Cross-Attention","authors":"Jie Zhu, Huibin Wang, Zhe Chen, Lili Zhang, Chunyan Ma","doi":"10.1049/ipr2.70184","DOIUrl":"10.1049/ipr2.70184","url":null,"abstract":"Underwater images captured by optical vision systems often suffer from degradation due to the absorption and scattering of reflected light. To mitigate the impact of the complex underwater environment on imaging quality, data-driven methods for underwater image enhancement have emerged as an effective strategy. However, convolutional neural networks exhibit limitations in processing global information interactions due to their inductive biases. Concurrently, transformers often overlook local details. To address these challenges, we propose FCUnet, a hybrid network designed to enhance the quality of underwater images. First, we propose a colour deviation preprocessing module that incorporates a multi-scale channel-wise attention for the fusion feature map to capture differences in rich spatial hierarchical information across channels. Second, we design the cross-attention block guided by multiple features to efficiently acquire local fine details and global scene characteristics. Finally, a universal feature fusion unit and a colour loss term are proposed to enhance the network's sensitivity to colour and texture deviation information. Ablation studies confirm the effectiveness of each individual component. Experimental results on public benchmark datasets demonstrate that the proposed FCUnet outperforms other competitive methods, achieving superior performance in both qualitative and quantitative evaluations.","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70184","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0