GPLM: Enhancing underwater images with Global Pyramid Linear Modulation

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-02-01 DOI:10.1016/j.imavis.2024.105361

Jinxin Shao, Haosu Zhang, Jianming Miao

{"title":"GPLM: Enhancing underwater images with Global Pyramid Linear Modulation","authors":"Jinxin Shao, Haosu Zhang, Jianming Miao","doi":"10.1016/j.imavis.2024.105361","DOIUrl":null,"url":null,"abstract":"<div><div>Underwater imagery often suffers from challenges such as color distortion, low contrast, blurring, and noise due to the absorption and scattering of light in water. These degradations complicate visual interpretation and hinder subsequent image processing. Existing methods struggle to effectively address the complex, spatially varying degradations without prior environmental knowledge or may produce unnatural enhancements. To overcome these limitations, we propose a novel method called Global Pyramid Linear Modulation that integrates physical degradation modeling with deep learning for underwater image enhancement. Our approach extends Feature-wise Linear Modulation to a four-dimensional structure, enabling fine-grained, spatially adaptive modulation of feature maps. Our method captures multi-scale contextual information by incorporating a feature pyramid architecture with self-attention and feature fusion mechanisms, effectively modeling complex degradations. We validate our method by integrating it into the MixDehazeNet model and conducting experiments on benchmark datasets. Our approach significantly improves the Peak Signal-to-Noise Ratio, increasing from 28.6 dB to 30.6 dB on the EUVP-515-test dataset. Compared to recent state-of-the-art methods, our method consistently outperforms them by over 3 dB in PSNR on datasets with ground truth. It improves the Underwater Image Quality Measure by more than one on datasets without ground truth. Furthermore, we demonstrate the practical applicability of our method on a real-world underwater dataset, achieving substantial improvements in image quality metrics and visually compelling results. These experiments confirm that our method effectively addresses the limitations of existing techniques by adaptively modeling complex underwater degradations, highlighting its potential for underwater image enhancement tasks.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"154 ","pages":"Article 105361"},"PeriodicalIF":4.2000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885624004669","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Underwater imagery often suffers from challenges such as color distortion, low contrast, blurring, and noise due to the absorption and scattering of light in water. These degradations complicate visual interpretation and hinder subsequent image processing. Existing methods struggle to effectively address the complex, spatially varying degradations without prior environmental knowledge or may produce unnatural enhancements. To overcome these limitations, we propose a novel method called Global Pyramid Linear Modulation that integrates physical degradation modeling with deep learning for underwater image enhancement. Our approach extends Feature-wise Linear Modulation to a four-dimensional structure, enabling fine-grained, spatially adaptive modulation of feature maps. Our method captures multi-scale contextual information by incorporating a feature pyramid architecture with self-attention and feature fusion mechanisms, effectively modeling complex degradations. We validate our method by integrating it into the MixDehazeNet model and conducting experiments on benchmark datasets. Our approach significantly improves the Peak Signal-to-Noise Ratio, increasing from 28.6 dB to 30.6 dB on the EUVP-515-test dataset. Compared to recent state-of-the-art methods, our method consistently outperforms them by over 3 dB in PSNR on datasets with ground truth. It improves the Underwater Image Quality Measure by more than one on datasets without ground truth. Furthermore, we demonstrate the practical applicability of our method on a real-world underwater dataset, achieving substantial improvements in image quality metrics and visually compelling results. These experiments confirm that our method effectively addresses the limitations of existing techniques by adaptively modeling complex underwater degradations, highlighting its potential for underwater image enhancement tasks.

Abstract Image

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.