{"title":"Ultra-high-definition underwater image enhancement via dual-domain interactive transformer network","authors":"Weiwei Li, Feiyuan Cao, Yiwen Wei, Zhenghao Shi, Xiuyi Jia","doi":"10.1007/s13042-024-02379-x","DOIUrl":null,"url":null,"abstract":"<p>The proliferation of ultra-high-definition (UHD) imaging device is increasingly being used for underwater image acquisition. However, due to light scattering and underwater impurities, UHD underwater images often suffer from color deviations and edge blurriness. Many studies have attempted to enhance underwater images by integrating frequency domain and spatial domain information. Nonetheless, these approaches often interactively fuse dual-domain features only in the final fusion module, neglecting the complementary and guiding roles of frequency domain and spatial domain features. Additionally, the extraction of dual-domain features is independent of each other, which leads to the sharp advantages and disadvantages of the dual-domain features extracted by these methods. Consequently, these methods impose high demands on the feature fusion capabilities of the fusion module. But in order to handle UHD underwater images, the fusion modules in these methods often stack only a limited number of convolution and activation function operations. This limitation results in insufficient fusion capability, leading to defects in the restoration of edges and colors in the images. To address these issues, we develop a dual-domain interaction network for enhancing UHD underwater images. The network takes into account both frequency domain and spatial domain features to complement and guide each other’s feature extraction patterns, and fully integrates the dual-domain features in the model to better recover image details and colors. Specifically, the network consists of a U-shaped structure, where each layer is composed of dual-domain interaction transformer blocks containing interactive multi-head attention and interactive simple gate feed-forward networks. The interactive multi-head attention captures local interaction features of frequency domain and spatial domain information using convolution operation, followed by multi-head attention operation to extract global information of the mixed features. The interactive simple gate feed-forward network further enhances the model’s dual-domain interaction capability and cross-dimensional feature extraction ability, resulting in clearer edges and more realistic colors in the images. Experimental results demonstrate that the performance of our proposal in enhancing underwater images is significantly better than existing methods.</p>","PeriodicalId":51327,"journal":{"name":"International Journal of Machine Learning and Cybernetics","volume":"32 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Machine Learning and Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s13042-024-02379-x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The proliferation of ultra-high-definition (UHD) imaging device is increasingly being used for underwater image acquisition. However, due to light scattering and underwater impurities, UHD underwater images often suffer from color deviations and edge blurriness. Many studies have attempted to enhance underwater images by integrating frequency domain and spatial domain information. Nonetheless, these approaches often interactively fuse dual-domain features only in the final fusion module, neglecting the complementary and guiding roles of frequency domain and spatial domain features. Additionally, the extraction of dual-domain features is independent of each other, which leads to the sharp advantages and disadvantages of the dual-domain features extracted by these methods. Consequently, these methods impose high demands on the feature fusion capabilities of the fusion module. But in order to handle UHD underwater images, the fusion modules in these methods often stack only a limited number of convolution and activation function operations. This limitation results in insufficient fusion capability, leading to defects in the restoration of edges and colors in the images. To address these issues, we develop a dual-domain interaction network for enhancing UHD underwater images. The network takes into account both frequency domain and spatial domain features to complement and guide each other’s feature extraction patterns, and fully integrates the dual-domain features in the model to better recover image details and colors. Specifically, the network consists of a U-shaped structure, where each layer is composed of dual-domain interaction transformer blocks containing interactive multi-head attention and interactive simple gate feed-forward networks. The interactive multi-head attention captures local interaction features of frequency domain and spatial domain information using convolution operation, followed by multi-head attention operation to extract global information of the mixed features. The interactive simple gate feed-forward network further enhances the model’s dual-domain interaction capability and cross-dimensional feature extraction ability, resulting in clearer edges and more realistic colors in the images. Experimental results demonstrate that the performance of our proposal in enhancing underwater images is significantly better than existing methods.
随着超高清(UHD)成像设备的普及,越来越多的水下图像采集技术得到应用。然而,由于光散射和水下杂质的影响,超高清水下图像往往存在色彩偏差和边缘模糊的问题。许多研究试图通过整合频域和空间域信息来增强水下图像。然而,这些方法往往只在最后的融合模块中对双域特征进行交互式融合,忽视了频域和空间域特征的互补和引导作用。此外,双域特征的提取相互独立,导致这些方法提取的双域特征优劣分明。因此,这些方法对融合模块的特征融合能力提出了很高的要求。但是,为了处理超高清水下图像,这些方法中的融合模块往往只能堆叠有限数量的卷积和激活函数运算。这种限制导致融合能力不足,从而造成图像边缘和色彩还原方面的缺陷。为了解决这些问题,我们开发了一种用于增强超高清水下图像的双域交互网络。该网络同时考虑了频域和空间域特征,与特征提取模式相互补充、相互引导,并将双域特征充分整合到模型中,以更好地恢复图像细节和色彩。具体来说,该网络由 U 型结构组成,其中每一层都由包含交互式多头注意力和交互式简单门前馈网络的双域交互变压器块组成。交互式多头注意力通过卷积运算捕捉频域和空间域信息的局部交互特征,然后通过多头注意力运算提取混合特征的全局信息。交互式简单门前馈网络进一步增强了模型的双域交互能力和跨维特征提取能力,使图像的边缘更清晰,色彩更逼真。实验结果表明,我们的建议在增强水下图像方面的性能明显优于现有方法。
期刊介绍:
Cybernetics is concerned with describing complex interactions and interrelationships between systems which are omnipresent in our daily life. Machine Learning discovers fundamental functional relationships between variables and ensembles of variables in systems. The merging of the disciplines of Machine Learning and Cybernetics is aimed at the discovery of various forms of interaction between systems through diverse mechanisms of learning from data.
The International Journal of Machine Learning and Cybernetics (IJMLC) focuses on the key research problems emerging at the junction of machine learning and cybernetics and serves as a broad forum for rapid dissemination of the latest advancements in the area. The emphasis of IJMLC is on the hybrid development of machine learning and cybernetics schemes inspired by different contributing disciplines such as engineering, mathematics, cognitive sciences, and applications. New ideas, design alternatives, implementations and case studies pertaining to all the aspects of machine learning and cybernetics fall within the scope of the IJMLC.
Key research areas to be covered by the journal include:
Machine Learning for modeling interactions between systems
Pattern Recognition technology to support discovery of system-environment interaction
Control of system-environment interactions
Biochemical interaction in biological and biologically-inspired systems
Learning for improvement of communication schemes between systems