{"title":"Empower network to comprehend: Semantic guided and attention fusion GAN for underwater image enhancement","authors":"Xiao Liu , Ziwei Liu , Li Yu","doi":"10.1016/j.image.2025.117271","DOIUrl":"10.1016/j.image.2025.117271","url":null,"abstract":"<div><div>In fields such as underwater exploration, acquiring clear and precise imagery is paramount for gathering diverse underwater information. Consequently, the development of robust underwater image enhancement (UIE) algorithms is of great significance. Leveraged by advancements in deep learning, UIE research has achieved substantial progress. Addressing the scarcity of underwater datasets and the imperative to refine the quality of enhanced reference images, this paper introduces a novel semantic-guided network architecture, termed SGAF-GAN. This model utilizes semantic information as an ancillary supervisory signal within the UIE network, steering the enhancement process towards semantically relevant areas while ameliorating issues with image edge blurriness. Moreover, in scenarios where rare image degradation co-occurs with semantically pertinent features, semantic information furnishes the network with prior knowledge, bolstering model performance and generalizability. This study integrates a feature attention fusion mechanism to preserve context information and amplify the influence of semantic guidance during cross-domain integration. Given the variable degradation in underwater images, the combination of spatial and channel attention empowers the network to assign more accurate weights to the most adversely affected regions, thereby elevating the overall image enhancement efficacy. Empirical evaluations demonstrate that SGAF-GAN excels across various real underwater datasets, aligning with human visual perception standards. On the SUIM dataset, SGAF-GAN achieves a PSNR of 24.30 and an SSIM of 0.9144.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117271"},"PeriodicalIF":3.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SAR-CDCFRN: A novel SAR despeckling approach utilizing correlated dual channel feature-based residual network","authors":"Anirban Saha, Arihant K.R., Suman Kumar Maji","doi":"10.1016/j.image.2025.117267","DOIUrl":"10.1016/j.image.2025.117267","url":null,"abstract":"<div><div>As a result of the increasing need for capturing and processing visual data of the Earth’s surface, Synthetic Aperture Radar (SAR) technology has been widely embraced by all space research organisations. The primary drawback in the acquired SAR visuals (images) is the presence of unwanted granular noise, called “speckle”, which poses a limitation to their processing and analysis. Therefore removing this unwanted speckle noise from the captured SAR visuals, a process known as despeckling, becomes an important task. This article introduces a new despeckling residual network named SAR-CDCFRN. This network simultaneously extracts speckle components from both the spatial and inverse spatial channels. The extracted features are then correlated by a dual-layer attention block and further processed to predict the distribution of speckle in the input noisy image. The predicted distribution, which is the residual noise, is then mapped with the input noisy SAR data to generate a despeckled output image. Experimental results confirm the superiority of the proposed despeckling model over other existing technologies in the literature.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117267"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M.A. Rahman , Salma Sultana Tunny , A.S.M. Kayes , Peng Cheng , Aminul Huq , M.S. Rana , Md. Rashidul Islam , Animesh Sarkar Tusher
{"title":"Approximation-based energy-efficient cyber-secured image classification framework","authors":"M.A. Rahman , Salma Sultana Tunny , A.S.M. Kayes , Peng Cheng , Aminul Huq , M.S. Rana , Md. Rashidul Islam , Animesh Sarkar Tusher","doi":"10.1016/j.image.2025.117261","DOIUrl":"10.1016/j.image.2025.117261","url":null,"abstract":"<div><div>In this work, an energy-efficient cyber-secured framework for deep learning-based image classification is proposed. This simultaneously addresses two major concerns in relevant applications, which are typically handled separately in the existing works. An image approximation-based data storage scheme to improve the efficiency of memory usage while reducing energy consumption at both the source and user ends is discussed. Also, the proposed framework mitigates the impacts of two different adversarial attacks, notably retaining performance. The experimental analysis signifies the academic and industrial importance of this work as it demonstrates reductions of 62.5% in energy consumption for image classification when accessing memory and in the effective memory sizes of both ends by the same amount. During the improvement of memory efficiency, the multi-scale structural similarity index measure (MS-SSIM) is found to be the optimum image quality assessment method among different similarity-based metrics for the image classification task with approximated images and an average image quality of 0.9449 in terms of MS-SSIM is maintained. Also, a comparative analysis of three different classifiers with different depths indicates that the proposed scheme maintains up to 90.17% of original classification accuracy under normal and cyber-attack scenarios, effectively defending against untargeted and targeted white-box adversarial attacks with varying parameters.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117261"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143127943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mireille El-Assal, Pierre Tirilly, Ioan Marius Bilasco
{"title":"Spiking two-stream methods with unsupervised STDP-based learning for action recognition","authors":"Mireille El-Assal, Pierre Tirilly, Ioan Marius Bilasco","doi":"10.1016/j.image.2025.117263","DOIUrl":"10.1016/j.image.2025.117263","url":null,"abstract":"<div><div>Video analysis is a computer vision task that is useful for many applications like surveillance, human-machine interaction, and autonomous vehicles. Deep learning methods are currently the state-of-the-art methods for video analysis. Particularly, two-stream methods, which leverage both spatial and temporal information, have proven to be valuable in Human Action Recognition (HAR). However, they have high computational costs, and need a large amount of labeled data for training. In addressing these challenges, this paper adopts a more efficient approach by leveraging Convolutional Spiking Neural Networks (CSNNs) trained with the unsupervised Spike Timing-Dependent Plasticity (STDP) learning rule for action classification. These networks represent the information using asynchronous low-energy spikes, which allows the network to be more energy efficient when implemented on neuromorphic hardware. Furthermore, learning visual features with unsupervised learning reduces the need for labeled data during training, making the approach doubly advantageous. Therefore, we explore transposing two-stream convolutional neural networks into the spiking domain, where we train each stream with the unsupervised STDP learning rule. We investigate the performance of these networks in video analysis by employing five distinct configurations for the temporal stream, and evaluate them across four benchmark HAR datasets. In this work, we show that two-stream CSNNs can successfully extract spatio-temporal information from videos despite using limited training data, and that the spiking spatial and temporal streams are complementary. We also show that replacing a dedicated temporal stream with a spatio-temporal one within a spiking two-stream architecture leads to information redundancy that hinders the performance.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117263"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengyuan Huang , Kan Chang , Qingpao Qin , Yahui Tang , Guiqing Li
{"title":"Conditional Laplacian pyramid networks for exposure correction","authors":"Mengyuan Huang , Kan Chang , Qingpao Qin , Yahui Tang , Guiqing Li","doi":"10.1016/j.image.2025.117276","DOIUrl":"10.1016/j.image.2025.117276","url":null,"abstract":"<div><div>Improper exposures greatly degenerate the visual quality of images. Correcting various exposure errors in a unified framework is challenging as it requires simultaneously handling global attributes and local details under different exposure conditions. In this paper, we propose a conditional Laplacian pyramid network (CLPN) for correcting different exposure errors in the same framework. It applies Laplacian pyramid to decompose an improperly exposed image into a low-frequency (LF) component and several high-frequency (HF) components, and then enhances the decomposed components in a coarse-to-fine manner. To consistently correct a wide range of exposure errors, a conditional feature extractor is designed to extract the conditional feature from the given image. Afterwards, the conditional feature is used to guide the refinement of LF features, so that a precisely correction for illumination, contrast and color tone can be obtained. As different frequency components exhibit pixel-wise correlations, the frequency components in lower pyramid layers are used to support the reconstruction of the HF components in higher layers. By doing so, fine details can be effectively restored, while noises can be well suppressed. Extensive experiments show that our method is more effective than state-of-the-art methods on correcting various exposure conditions ranging from severe underexposure to intense overexposure.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117276"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ATM-DEN: Image Inpainting via attention transfer module and Decoder-Encoder network","authors":"Siwei Zhang , Yuantao Chen","doi":"10.1016/j.image.2025.117268","DOIUrl":"10.1016/j.image.2025.117268","url":null,"abstract":"<div><div>The current prevailing techniques for image restoration predominantly employ self-encoding and decoding networks, aiming to reconstruct the original image during the decoding phase utilizing the compressed data captured during encoding. Nevertheless, the self-encoding network inherently suffers from information loss during compression, rendering it challenging to achieve nuanced restoration outcomes solely reliant on compressed information, particularly manifesting as blurred imagery and distinct edge artifacts around the restored areas. To mitigate this issue of insufficient image information utilization, we introduce a Multi-Stage Decoding Network in this study. This network leverages multiple decoders to decode and integrate features from each layer of the encoding stage, thereby enhancing the exploitation of encoder features across various scales. Subsequently, a feature mapping is derived that more accurately captures the content of the impaired region. Comparative experiments conducted on globally recognized datasets demonstrate that MSDN achieves a notable enhancement in the visual quality of restored images.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117268"},"PeriodicalIF":3.4,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143127942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image super-resolution based on multifractals in transfer domain","authors":"Xunxiang Yao, Qiang Wub, Peng Zhange, Fangxun Baod","doi":"10.1016/j.image.2024.117221","DOIUrl":"10.1016/j.image.2024.117221","url":null,"abstract":"<div><div>The goal of image super-resolution technique is to reconstruct high-resolution image with fine texture details from its low-resolution version.On Fourier domain,such fine details are more related to the information in the highfrequency spectrum. Most of existing methods do not have specific modules to handle such high-frequency information adaptively. Thus, they cause edge blur or texture disorder. To tackle the problems, this work explores image super-resolution on multiple sub-bands of the corresponding image, which are generated by NonSubsampled Contourlet Transform (NSCT). Different sub-bands hold the information of different frequency which is then related to the detailedness of information of the given low-resolution image.In this work, such image information detailedness is formulated as image roughness. Moreover, fractals analysis is applied to each sub-band image. Since fractals can mathematically represent the image roughness, it then is able to represent the detailedness (i.e. various frequency of image information). Overall, a multi-fractals formulation is established based on multiple sub-bands image. On each sub-band, different fractals representation is created adaptively. In this way, the image super-resolution process is transformed into a multifractal optimization problem. The experiment result demonstrates the effectiveness of the proposed method in recovering high-frequency details.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117221"},"PeriodicalIF":3.4,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143420492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jorge Bacca , Christian Arcos , Juan Marcos Ramírez , Henry Arguello
{"title":"Middle-output deep image prior for blind hyperspectral and multispectral image fusion","authors":"Jorge Bacca , Christian Arcos , Juan Marcos Ramírez , Henry Arguello","doi":"10.1016/j.image.2024.117247","DOIUrl":"10.1016/j.image.2024.117247","url":null,"abstract":"<div><div>Obtaining a low-spatial-resolution hyperspectral image (HS) or low-spectral-resolution multispectral (MS) image from a high-resolution (HR) spectral image is straightforward with knowledge of the acquisition models. However, the reverse process, from HS and MS to HR, is an ill-posed problem known as spectral image fusion. Although recent fusion techniques based on supervised deep learning have shown promising results, these methods require large training datasets involving expensive acquisition costs and long training times. In contrast, unsupervised HS and MS image fusion methods have emerged as an alternative to data demand issues; however, they rely on the knowledge of the linear degradation models for optimal performance. To overcome these challenges, we propose the Middle-Output Deep Image Prior (MODIP) for unsupervised blind HS and MS image fusion. MODIP is adjusted for the HS and MS images, and the HR fused image is estimated at an intermediate layer within the network. The architecture comprises two convolutional networks that reconstruct the HR spectral image from HS and MS inputs, along with two networks that appropriately downscale the estimated HR image to match the available MS and HS images, learning the non-linear degradation models. The network parameters of MODIP are jointly and iteratively adjusted by minimizing a proposed loss function. This approach can handle scenarios where the degradation operators are unknown or partially estimated. To evaluate the performance of MODIP, we test the fusion approach on three simulated spectral image datasets (Pavia University, Salinas Valley, and CAVE) and a real dataset obtained through a testbed implementation in an optical lab. Extensive simulations demonstrate that MODIP outperforms other unsupervised model-based image fusion methods by up to 6 dB in PNSR.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"132 ","pages":"Article 117247"},"PeriodicalIF":3.4,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143148378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}