HCTMIF: Hybrid CNN-Transformer Multi Information Fusion Network for Low Light Image Enhancement

IF 2.2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Image Processing Pub Date : 2025-06-24 DOI:10.1049/ipr2.70127

Han Wang, Hengshuai Cui, Jinjiang Li, Zhen Hua

{"title":"HCTMIF: Hybrid CNN-Transformer Multi Information Fusion Network for Low Light Image Enhancement","authors":"Han Wang, Hengshuai Cui, Jinjiang Li, Zhen Hua","doi":"10.1049/ipr2.70127","DOIUrl":null,"url":null,"abstract":"<p>Images captured with poor hardware and insufficient light sources suffer from visual degradation such as low visibility, strong noise, and color casts. Low-light image enhancement methods focus on solving the problem of brightness in dark areas while eliminating the degradation of low-light images. To solve the above problems, we proposed a hybrid CNN-transformer multi information fusion network (HCTMIF) for low-light image enhancement. In this paper, the proposed network architecture is divided into three stages to progressively improve the degraded features of low-light images using the divide-and-conquer principle. First, both the first stage and the second stage adopt the encoder–decoder architecture composed of transformer and CNN to improve the long-distance modeling and local feature extraction capabilities of the network. We add a visual enhancement module (VEM) to the encoding block to further strengthen the network's ability to learn global and local information. In addition, the multi-information fusion block (MIFB) is used to complement the feature maps corresponding to the same scale of the coding block and decoding block of each layer. Second, to improve the mobility of useful information across stages, we designed the self-supervised module (SSM) to readjust the weight parameters to enhance the characterization of local features. Finally, to retain the spatial details of the enhanced images more precisely, we design the detail supplement unit (DSU) to enrich the saturation of the enhanced images. After qualitative and quantitative analyses on multiple benchmark datasets, our method outperforms other methods in terms of visual effects and metric scores.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70127","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://ietresearch.onlinelibrary.wiley.com/doi/10.1049/ipr2.70127","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Images captured with poor hardware and insufficient light sources suffer from visual degradation such as low visibility, strong noise, and color casts. Low-light image enhancement methods focus on solving the problem of brightness in dark areas while eliminating the degradation of low-light images. To solve the above problems, we proposed a hybrid CNN-transformer multi information fusion network (HCTMIF) for low-light image enhancement. In this paper, the proposed network architecture is divided into three stages to progressively improve the degraded features of low-light images using the divide-and-conquer principle. First, both the first stage and the second stage adopt the encoder–decoder architecture composed of transformer and CNN to improve the long-distance modeling and local feature extraction capabilities of the network. We add a visual enhancement module (VEM) to the encoding block to further strengthen the network's ability to learn global and local information. In addition, the multi-information fusion block (MIFB) is used to complement the feature maps corresponding to the same scale of the coding block and decoding block of each layer. Second, to improve the mobility of useful information across stages, we designed the self-supervised module (SSM) to readjust the weight parameters to enhance the characterization of local features. Finally, to retain the spatial details of the enhanced images more precisely, we design the detail supplement unit (DSU) to enrich the saturation of the enhanced images. After qualitative and quantitative analyses on multiple benchmark datasets, our method outperforms other methods in terms of visual effects and metric scores.

Abstract Image

查看原文本刊更多论文

hctif：用于弱光图像增强的CNN-Transformer混合多信息融合网络

用较差的硬件和不足的光源捕获的图像会遭受视觉退化，如低可见度、强噪声和偏色。弱光图像增强方法在消除弱光图像退化的同时，重点解决暗区亮度问题。为了解决上述问题，我们提出了一种用于弱光图像增强的混合CNN-transformer多信息融合网络（hctif）。本文采用分而治之的原则，将所提出的网络架构分为三个阶段，逐步改善弱光图像的退化特征。首先，第一阶段和第二阶段都采用了变压器和CNN组成的编码器-解码器架构，提高了网络的远距离建模和局部特征提取能力。在编码块中加入视觉增强模块（VEM），进一步增强网络对全局和局部信息的学习能力。此外，利用多信息融合块（MIFB）对每层编码块和解码块对应的相同尺度的特征映射进行补充。其次，为了提高有用信息在各个阶段的移动性，我们设计了自监督模块（SSM）来重新调整权重参数，以增强局部特征的表征。最后，为了更精确地保留增强图像的空间细节，我们设计了细节补充单元（DSU）来丰富增强图像的饱和度。在对多个基准数据集进行定性和定量分析后，我们的方法在视觉效果和度量分数方面优于其他方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Image Processing 工程技术-工程：电子与电气

CiteScore

5.40

自引率

8.70%

发文量

282

审稿时长

6 months

期刊介绍： The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications. Principal topics include: Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality. Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing. Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing. Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video. Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography. Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security. Current Special Issue Call for Papers: Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf