Image and Vision Computing最新文献

筛选
英文 中文
SFFEF-YOLO: Small object detection network based on fine-grained feature extraction and fusion for unmanned aerial images SFFEF-YOLO:基于细粒度特征提取与融合的无人机小目标检测网络
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-26 DOI: 10.1016/j.imavis.2025.105469
Chenxi Bai , Kexin Zhang , Haozhe Jin , Peng Qian , Rui Zhai , Ke Lu
{"title":"SFFEF-YOLO: Small object detection network based on fine-grained feature extraction and fusion for unmanned aerial images","authors":"Chenxi Bai ,&nbsp;Kexin Zhang ,&nbsp;Haozhe Jin ,&nbsp;Peng Qian ,&nbsp;Rui Zhai ,&nbsp;Ke Lu","doi":"10.1016/j.imavis.2025.105469","DOIUrl":"10.1016/j.imavis.2025.105469","url":null,"abstract":"<div><div>Unmanned aerial vehicles (UAVs) images object detection has emerged as a research hotspot, yet remains a significant challenge due to variable target scales and the high proportion of small objects caused by UAVs’ diverse altitudes and angles. To address these issues, we propose a novel Small Object Detection Network Based on Fine-Grained Feature Extraction and Fusion(SFFEF-YOLO). First, we introduce a tiny prediction head to replace the large prediction head, enhancing the detection accuracy for tiny objects while reducing model complexity. Second, we design a Fine-Grained Information Extraction Module (FIEM) to replace standard convolutions. This module improves feature extraction and reduces information loss during downsampling by utilizing multi-branch operations and SPD-Conv. Third, we develop a Multi-Scale Feature Fusion Module (MFFM), which adds an additional skip connection branch based on the bidirectional feature pyramid network (BiFPN) to preserve fine-grained information and improve multi-scale feature fusion. We evaluated SFFEF-YOLO on the VisDrone2019-DET and UAVDT datasets. Compared to YOLOv8, experimental results demonstrate that SFFEF-YOLO achieves a 9.9% mAP0.5 improvement on the VisDrone2019-DET dataset and a 3.6% mAP0.5 improvement on the UAVDT dataset.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105469"},"PeriodicalIF":4.2,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143519440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AF2CN: Towards effective demoiréing from multi-resolution images AF2CN:实现多分辨率图像的有效分解
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-24 DOI: 10.1016/j.imavis.2025.105467
Shitan Asu, Yujin Dai, Shijie Li, Zheng Li
{"title":"AF2CN: Towards effective demoiréing from multi-resolution images","authors":"Shitan Asu,&nbsp;Yujin Dai,&nbsp;Shijie Li,&nbsp;Zheng Li","doi":"10.1016/j.imavis.2025.105467","DOIUrl":"10.1016/j.imavis.2025.105467","url":null,"abstract":"<div><div>Recently, CNN-based methods have gained significant attention for addressing the demoiré task due to their powerful feature extraction capabilities. However, these methods are generally trained on datasets with fixed resolutions, limiting their applicability to diverse real-world scenarios. To address this limitation, we introduce a more generalized task: effective demoiréing across multiple resolutions. To facilitate this task, we constructed MTADM, the first multi-resolution moiré dataset, designed to capture diverse real-world scenarios. Leveraging this dataset, we conducted extensive studies and introduced the Adaptive Fractional Calculus and Adjacency Fusion Convolution Network (AF2CN). Specifically, we employ fractional derivatives to develop an adaptive frequency enhancement module, which refines spatial distribution and texture details in moiré patterns. Additionally, we design a spatial attention gate to enhance deep feature interaction. Extensive experiments demonstrate that AF2CN effectively handles multi-resolution moiré patterns. It significantly outperforms previous state-of-the-art methods on fixed-resolution benchmarks while requiring fewer parameters and achieving lower computational costs.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105467"},"PeriodicalIF":4.2,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143487618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NPVForensics: Learning VA correlations in non-critical phoneme–viseme regions for deepfake detection NPVForensics:学习非关键音素-视觉素区域的VA相关性,用于深度伪造检测
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-23 DOI: 10.1016/j.imavis.2025.105461
Yu Chen , Yang Yu , Rongrong Ni , Haoliang Li , Wei Wang , Yao Zhao
{"title":"NPVForensics: Learning VA correlations in non-critical phoneme–viseme regions for deepfake detection","authors":"Yu Chen ,&nbsp;Yang Yu ,&nbsp;Rongrong Ni ,&nbsp;Haoliang Li ,&nbsp;Wei Wang ,&nbsp;Yao Zhao","doi":"10.1016/j.imavis.2025.105461","DOIUrl":"10.1016/j.imavis.2025.105461","url":null,"abstract":"<div><div>Advanced deepfake technology enables the manipulation of visual and audio signals within videos, leading to visual–audio (VA) inconsistencies. Current multimodal detectors primarily rely on VA contrastive learning to identify such inconsistencies, particularly in critical phoneme–viseme (PV) regions. However, state-of-the-art deepfake techniques have aligned critical PV pairs, thereby reducing the inconsistency traces on which existing methods rely. Due to technical constraints, forgers cannot fully synchronize VA in non-critical phoneme–viseme (NPV) regions. Consequently, we exploit inconsistencies in NPV regions as a general cue for deepfake detection. We propose NPVForensics, a two-stage VA correlation learning framework specifically designed to detect VA inconsistencies in NPV regions of deepfake videos. Firstly, to better extract VA unimodal features, we utilize the Swin Transformer to capture long-term global dependencies. Additionally, the Local Feature Aggregation (LFA) module aggregates local features from spatial and channel dimensions, thus preserving more comprehensive and subtle information. Secondly, the VA Correlation Learning (VACL) module enhances intra-modal augmentation and inter-modal information interaction, exploring intrinsic correlations between the two modalities. Moreover, Representation Alignment is introduced for real videos to narrow the modal gap and effectively extract VA correlations. Finally, our model is pre-trained on real videos using a self-supervised strategy and fine-tuned for the deepfake detection task. We conducted extensive experiments on six widely used deepfake datasets: FaceForensics++, FakeAVCeleb, Celeb-DF-v2, DFDC, FaceShifter, and DeeperForensics-1.0. Our method achieves state-of-the-art performance in cross-manipulation generalization and robustness. Notably, our approach demonstrates superior performance on VA-coordinated datasets such as A2V, T2V-L, and T2V-S. It indicates that VA inconsistencies in NPV regions serve as a general cue for deepfake detection.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105461"},"PeriodicalIF":4.2,"publicationDate":"2025-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143519335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Field Fusion for few-shot novel view synthesis 基于特征场融合的少镜头新视角合成
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-22 DOI: 10.1016/j.imavis.2025.105465
Junting Li , Yanghong Zhou , Jintu Fan , Dahua Shou , Sa Xu , P.Y. Mok
{"title":"Feature Field Fusion for few-shot novel view synthesis","authors":"Junting Li ,&nbsp;Yanghong Zhou ,&nbsp;Jintu Fan ,&nbsp;Dahua Shou ,&nbsp;Sa Xu ,&nbsp;P.Y. Mok","doi":"10.1016/j.imavis.2025.105465","DOIUrl":"10.1016/j.imavis.2025.105465","url":null,"abstract":"<div><div>Reconstructing neural radiance fields from limited or sparse views has given very promising potential for this field of research. Previous methods usually constrain the reconstruction process with additional priors, e.g. semantic-based or patch-based regularization. Nevertheless, such regularization is given to the synthesis of unseen views, which may not effectively assist the field of learning, in particular when the training views are sparse. Instead, we propose a feature Field Fusion (FFusion) NeRF in this paper that can learn structure and more details from features extracted from pre-trained neural networks for the sparse training views, and use as extra guide for the training of the RGB field. With such extra feature guides, FFusion predicts more accurate color and density when synthesizing novel views. Experimental results have shown that FFusion can effectively improve the quality of the synthesized novel views with only limited or sparse inputs.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105465"},"PeriodicalIF":4.2,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143473931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PolarDETR: Polar Parametrization for vision-based surround-view 3D detection PolarDETR:基于视觉的环视三维检测极坐标参数化
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-21 DOI: 10.1016/j.imavis.2025.105438
Shaoyu Chen , Xinggang Wang , Tianheng Cheng , Qian Zhang , Chang Huang , Wenyu Liu
{"title":"PolarDETR: Polar Parametrization for vision-based surround-view 3D detection","authors":"Shaoyu Chen ,&nbsp;Xinggang Wang ,&nbsp;Tianheng Cheng ,&nbsp;Qian Zhang ,&nbsp;Chang Huang ,&nbsp;Wenyu Liu","doi":"10.1016/j.imavis.2025.105438","DOIUrl":"10.1016/j.imavis.2025.105438","url":null,"abstract":"<div><div>3D detection based on surround-view camera system is a critical and promising technique in autopilot. In this work, we exploit the view symmetry of surround-view camera system as inductive bias to improve optimization and boost performance. We parameterize object’s position by polar coordinate and decompose velocity along radial and tangential direction. And the perception range, label assignment and loss function are correspondingly reformulated in polar coordinate system. This new Polar Parametrization scheme establishes explicit associations between image patterns and prediction targets. Based on it, we propose a surround-view 3D detection method, termed PolarDETR. PolarDETR achieves competitive performance on nuScenes dataset. Thorough ablation studies are provided to validate the effectiveness.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105438"},"PeriodicalIF":4.2,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143471219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multispectral images reconstruction using median filtering based spectral correlation 基于光谱相关的中值滤波多光谱图像重建
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-21 DOI: 10.1016/j.imavis.2025.105462
Vishwas Rathi , Abhilasha Sharma , Amit Kumar Singh
{"title":"Multispectral images reconstruction using median filtering based spectral correlation","authors":"Vishwas Rathi ,&nbsp;Abhilasha Sharma ,&nbsp;Amit Kumar Singh","doi":"10.1016/j.imavis.2025.105462","DOIUrl":"10.1016/j.imavis.2025.105462","url":null,"abstract":"<div><div>Multispectral images are widely utilized in various computer vision applications because they capture more information than traditional color images. Multispectral imaging systems utilize a multispectral filter array (MFA), an extension of the color filter array found in standard RGB cameras. This approach provides an efficient, cost-effective, and practical method for capturing multispectral images. The primary challenge with multispectral imaging systems using an MFA is the significant undersampling of spectral bands in the mosaicked image. This occurs because a multispectral mosaic image contains a greater number of spectral bands compared to an RGB mosaicked image, leading to reduced sampling density per band. Now, multispectral demosaicing algorithm is required to generate the complete multispectral image from the mosaicked image. The effectiveness of demosaicing algorithms relies heavily on the efficient utilization of spatial and spectral correlations inherent in mosaicked images. In the proposed method, a binary tree-based MFA pattern is employed to capture the mosaicked image. Rather than directly leveraging spectral correlations between bands, median filtering is applied to the spectral differences to mitigate the impact of noise on these correlations. Experimental results demonstrate that the proposed method achieves an improvement of 1.03 dB and 0.92 dB on average from 5-band to 10-band multispectral images from the widely used TokyoTech and CAVE datasets, respectively.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105462"},"PeriodicalIF":4.2,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143473818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gait recognition via View-aware Part-wise Attention and Multi-scale Dilated Temporal Extractor 通过视图感知部分注意力和多尺度稀释时态提取器识别步态
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-20 DOI: 10.1016/j.imavis.2025.105464
Xu Song , Yang Wang , Yan Huang , Caifeng Shan
{"title":"Gait recognition via View-aware Part-wise Attention and Multi-scale Dilated Temporal Extractor","authors":"Xu Song ,&nbsp;Yang Wang ,&nbsp;Yan Huang ,&nbsp;Caifeng Shan","doi":"10.1016/j.imavis.2025.105464","DOIUrl":"10.1016/j.imavis.2025.105464","url":null,"abstract":"<div><div>Gait recognition based on silhouette sequences has made significant strides in recent years through the extraction of body shape and motion features. However, challenges remain in achieving accurate gait recognition under covariate changes, such as variations in view and clothing. To tackle these issues, this paper introduces a novel methodology incorporating a View-aware Part-wise Attention (VPA) mechanism and a Multi-scale Dilated Temporal Extractor (MDTE) to enhance gait recognition. Distinct from existing techniques, VPA mechanism acknowledges the differential sensitivity of various body parts to view changes, applying targeted attention weights at the feature level to improve the efficacy of view-aware constraints in areas of higher saliency or distinctiveness. Concurrently, MDTE employs dilated convolutions across multiple scales to capture the temporal dynamics of gait at diverse levels, thereby refining the motion representation. Comprehensive experiments on the CASIA-B, OU-MVLP, and Gait3D datasets validate the superior performance of our approach. Remarkably, our method achieves a 91.0% accuracy rate under clothing-change conditions on the CASIA-B dataset using solely silhouette information, surpassing the current state-of-the-art (SOTA) techniques. These results underscore the effectiveness and adaptability of our proposed strategy in overcoming the complexities of gait recognition amidst covariate changes.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105464"},"PeriodicalIF":4.2,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143527562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FRoundation: Are foundation models ready for face recognition? 基础:基础模型准备好用于人脸识别了吗?
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-19 DOI: 10.1016/j.imavis.2025.105453
Tahar Chettaoui , Naser Damer , Fadi Boutros
{"title":"FRoundation: Are foundation models ready for face recognition?","authors":"Tahar Chettaoui ,&nbsp;Naser Damer ,&nbsp;Fadi Boutros","doi":"10.1016/j.imavis.2025.105453","DOIUrl":"10.1016/j.imavis.2025.105453","url":null,"abstract":"<div><div>Foundation models are predominantly trained in an unsupervised or self-supervised manner on highly diverse and large-scale datasets, making them broadly applicable to various downstream tasks. In this work, we investigate for the first time whether such models are suitable for the specific domain of face recognition (FR). We further propose and demonstrate the adaptation of these models for FR across different levels of data availability, including synthetic data. Extensive experiments are conducted on multiple foundation models and datasets of varying scales for training and fine-tuning, with evaluation on a wide range of benchmarks. Our results indicate that, despite their versatility, pre-trained foundation models tend to underperform in FR in comparison with similar architectures trained specifically for this task. However, fine-tuning foundation models yields promising results, often surpassing models trained from scratch, particularly when training data is limited. For example, after fine-tuning only on 1K identities, DINOv2 ViT-S achieved average verification accuracy on LFW, CALFW, CPLFW, CFP-FP, and AgeDB30 benchmarks of 87.10%, compared to 64.70% achieved by the same model and without fine-tuning. While training the same model architecture, ViT-S, from scratch on 1k identities reached 69.96%. With access to larger-scale FR training datasets, these performances reach 96.03% and 95.59% for the DINOv2 and CLIP ViT-L models, respectively. In comparison to the ViT-based architectures trained from scratch for FR, fine-tuned same architectures of foundation models achieve similar performance while requiring lower training computational costs and not relying on the assumption of extensive data availability. We further demonstrated the use of synthetic face data, showing improved performances over both pre-trained foundation and ViT models. Additionally, we examine demographic biases, noting slightly higher biases in certain settings when using foundation models compared to models trained from scratch. We release our code and pre-trained models’ weights at <span><span>github.com/TaharChettaoui/FRoundation</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105453"},"PeriodicalIF":4.2,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143510399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vehicle re-identification with large separable kernel attention and hybrid channel attention 基于大可分离核注意和混合通道注意的车辆再识别
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-17 DOI: 10.1016/j.imavis.2025.105442
Xuezhi Xiang , Zhushan Ma , Xiaoheng Li , Lei Zhang , Xiantong Zhen
{"title":"Vehicle re-identification with large separable kernel attention and hybrid channel attention","authors":"Xuezhi Xiang ,&nbsp;Zhushan Ma ,&nbsp;Xiaoheng Li ,&nbsp;Lei Zhang ,&nbsp;Xiantong Zhen","doi":"10.1016/j.imavis.2025.105442","DOIUrl":"10.1016/j.imavis.2025.105442","url":null,"abstract":"<div><div>With the rapid development of intelligent transportation systems and the popularity of smart city infrastructure, Vehicle Re-ID technology has become an important research field. The vehicle Re-ID task faces an important challenge, which is the high similarity between different vehicles. Existing methods use additional detection or segmentation models to extract differentiated local features. However, these methods either rely on additional annotations or greatly increase the computational cost. Using attention mechanism to capture global and local features is crucial to solve the challenge of high similarity between classes in vehicle Re-ID tasks. In this paper, we propose LSKA-ReID with large separable kernel attention and hybrid channel attention. Specifically, the large separable kernel attention (LSKA) utilizes the advantages of self-attention and also benefits from the advantages of convolution, which can extract the global and local features of the vehicle more comprehensively. We also compare the performance of LSKA and large kernel attention (LKA) on the vehicle ReID task. We also introduce hybrid channel attention (HCA), which combines channel attention with spatial information, so that the model can better focus on channels and feature regions, and ignore background and other disturbing information. Extensive experiments on three popular datasets VeRi-776, VehicleID and VERI-Wild demonstrate the effectiveness of LSKA-ReID. In particular, on VeRi-776 dataset, mAP reaches 86.78% and Rank-1 reaches 98.09%.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"155 ","pages":"Article 105442"},"PeriodicalIF":4.2,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143454134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Innovative underwater image enhancement algorithm: Combined application of adaptive white balance color compensation and pyramid image fusion to submarine algal microscopy 创新的水下图像增强算法:自适应白平衡色彩补偿和金字塔图像融合在海底藻类显微镜中的结合应用
IF 4.2 3区 计算机科学
Image and Vision Computing Pub Date : 2025-02-16 DOI: 10.1016/j.imavis.2025.105466
Yi-Ning Fan , Geng-Kun Wu , Jia-Zheng Han , Bei-Ping Zhang , Jie Xu
{"title":"Innovative underwater image enhancement algorithm: Combined application of adaptive white balance color compensation and pyramid image fusion to submarine algal microscopy","authors":"Yi-Ning Fan ,&nbsp;Geng-Kun Wu ,&nbsp;Jia-Zheng Han ,&nbsp;Bei-Ping Zhang ,&nbsp;Jie Xu","doi":"10.1016/j.imavis.2025.105466","DOIUrl":"10.1016/j.imavis.2025.105466","url":null,"abstract":"<div><div>Real-time collected microscopic images of harmful algal blooms (HABs) in coastal areas often suffer from significant color deviations and loss of fine cellular details. To address these issues, this paper proposes an innovative method for enhancing underwater marine algal microscopic images based on Adaptive White Balance Color Compensation (AWBCC) and Image Pyramid Fusion (IPF). Firstly, an effective Algorithm Adaptive Cyclic Channel Compensation (ACCC) is proposed based on the gray world assumption to enhance the color of underwater images. Then, the Maximum Color Channel Attention Guidance (MCCAG) method is employed to reduce color disturbance caused by ignoring light absorption. This paper introduces an Empirical Contrast Enhancement (ECH) module based on multi-scale IPF tailored for underwater microscopic images of algae, which is used for global contrast enhancement, texture detail enhancement, and noise control. Secondly, this paper proposes a network based on a diffusion probability model for edge detection in HABs, which simultaneously considers both high-order and low-order features extracted from images. This approach enriches the semantic information of the feature maps and enhances edge detection accuracy. This edge detection method achieves an ODS of 0.623 and an OIS of 0.683. Experimental evaluations demonstrate that our underwater algae microscopic image enhancement method amplifies local texture features while preserving the original image structure. This significantly improves the accuracy of edge detection and key point matching. Compared to several state-of-the-art underwater image enhancement methods, our approach achieves the highest values in contrast, average gradient, entropy, and Enhancement Measure Estimation (EME), and also delivers competitive results in terms of image noise control.<!--> <!-->.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"156 ","pages":"Article 105466"},"PeriodicalIF":4.2,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143471218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信