Lei Zhou , Jiasong Wang , Jing Luo , Yuheng Guo , Xiaoxiao Li
{"title":"Optimizing multi-task network with learned prototypes for weakly supervised semantic segmentation","authors":"Lei Zhou , Jiasong Wang , Jing Luo , Yuheng Guo , Xiaoxiao Li","doi":"10.1016/j.image.2025.117272","DOIUrl":"10.1016/j.image.2025.117272","url":null,"abstract":"<div><div>Weakly supervised semantic segmentation (WSSS) presents a challenging task wherein semantic objects are extracted solely through the utilization of image-level labels as supervision. One common category of state-of-the-art solutions depends on the generation of pseudo pixel-level annotations via the use of localization maps. Nevertheless, in the majority of such solutions, the quality of pseudo annotations may not effectively fulfill the requirements of semantic segmentation owing to the incomplete nature of the localization maps. In order to generate denser localization maps for WSSS, this paper proposes the use of a prototype learning guided multi-task network. Initially, the prototypes (also referred to as prototypical feature vectors) are employed to depict the similarities between images. Specifically, the shared information among different training images is thoroughly exploited to concomitantly learn the prototypes for both foreground categories and background. This approach facilitates the localization of more reliable background pixels and foreground regions by evaluating the similarities between the representative prototypes and the extracted features of pixels. Additionally, the learned prototypes can be incorporated into the multi-task network to enhance the efficiency of parameter optimization by adaptively rectifying errors in pixel-level supervision. Therefore, the optimization of the multi-task network for object localization and the production of high-quality proxy annotations can be achieved by means of clean image-level labels and refined pixel-level supervision working in conjunction. By selecting and refining proxy annotations, the performance of the segmentation algorithm can be further improved. Extensive experiments conducted on two datasets, namely, PASCAL VOC 2012 and COCO 2014, have substantiated the fact that the prototype learning guided multi-task network being proposed outperforms the current state-of-the-art (SOTA) methods in terms of segmentation performance, achieving a mean IoU of 72.1% and 72.6% on the PASCAL VOC 2012 validation and test sets, respectively.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117272"},"PeriodicalIF":3.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Physics prior-based contrastive learning for low-light image enhancement","authors":"Hongxiang Liu, Yunliang Zhuang, Chen Lyu","doi":"10.1016/j.image.2025.117274","DOIUrl":"10.1016/j.image.2025.117274","url":null,"abstract":"<div><div>Capturing images in low-light conditions can lead to losing image content, making low-light image enhancement a practically challenging task. Various deep-learning methods have been proposed to address this challenge, demonstrating significant progress. However, existing methods still face challenges in achieving uniform brightness enhancement. These methods rely solely on normal-light images to guide the training of the enhancement network, resulting in insufficient utilization of low-light image information. We propose a novel Illumination Contrastive Learning (ICL) that employs positive and negative samples to improve contrast relationships and combines local brightness data to align luminance images closer to normal light and away from low-light areas. Existing methods that use channel attention mechanisms often neglect global channel dependencies, leading to poor color contrast in enhanced images. We address this issue by developing a Multi-scale Channel Dependency Representation Block (MCRB) that utilizes multi-scale attention to capture a wide range of channel dependencies, thereby enhancing contrast more effectively. Based on the Retinex theory, our method maximizes the use of illumination information in low-light images and integrates contrast learning into a Retinex-based framework. This integration results in a more uniform brightness distribution and improved visual effects in enhanced images. The effectiveness of our method has been validated through tests on various synthetic and natural datasets, surpassing existing state-of-the-art methods.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117274"},"PeriodicalIF":3.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Anti-noise face: A resilient model for face recognition with labeled noise data","authors":"Lei Wang, Xun Gong, Jie Zhang, Rui Chen","doi":"10.1016/j.image.2025.117269","DOIUrl":"10.1016/j.image.2025.117269","url":null,"abstract":"<div><div>With the remarkable success of face recognition driven by large-scale datasets, noise learning has gained increasing attention due to the prevalence of noise within these datasets. While various margin-based loss functions and training strategies for label noise have been recently devised, two issues still remain to consider: (1) The explicit emphasis on specific characteristics of different types of noise is required. (2) The potential impact of noise during the early stages of training, which may lead to convergence issues, should not be ignored. In this study, we propose a comprehensive algorithm for learning with label noise. Compared to the existing noise self-correction methods, we further enhance detecting closed-set noise by introduce a closed-set noise self-correction module, and introduce a novel loss function for handling remaining noisy samples detected by an improved Gaussian Mixture Model. Additionally, we use a progressive approach, where we work through the easy examples first and then move on to the difficult ones, just as a student work through a course with the easy ones first and then the difficult ones later. Extensive experiments conducted on the synthesized noise dataset and on popular benchmarks have demonstrated the superior effectiveness of our approach over state-of-the-art alternatives.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117269"},"PeriodicalIF":3.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143172443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empower network to comprehend: Semantic guided and attention fusion GAN for underwater image enhancement","authors":"Xiao Liu , Ziwei Liu , Li Yu","doi":"10.1016/j.image.2025.117271","DOIUrl":"10.1016/j.image.2025.117271","url":null,"abstract":"<div><div>In fields such as underwater exploration, acquiring clear and precise imagery is paramount for gathering diverse underwater information. Consequently, the development of robust underwater image enhancement (UIE) algorithms is of great significance. Leveraged by advancements in deep learning, UIE research has achieved substantial progress. Addressing the scarcity of underwater datasets and the imperative to refine the quality of enhanced reference images, this paper introduces a novel semantic-guided network architecture, termed SGAF-GAN. This model utilizes semantic information as an ancillary supervisory signal within the UIE network, steering the enhancement process towards semantically relevant areas while ameliorating issues with image edge blurriness. Moreover, in scenarios where rare image degradation co-occurs with semantically pertinent features, semantic information furnishes the network with prior knowledge, bolstering model performance and generalizability. This study integrates a feature attention fusion mechanism to preserve context information and amplify the influence of semantic guidance during cross-domain integration. Given the variable degradation in underwater images, the combination of spatial and channel attention empowers the network to assign more accurate weights to the most adversely affected regions, thereby elevating the overall image enhancement efficacy. Empirical evaluations demonstrate that SGAF-GAN excels across various real underwater datasets, aligning with human visual perception standards. On the SUIM dataset, SGAF-GAN achieves a PSNR of 24.30 and an SSIM of 0.9144.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117271"},"PeriodicalIF":3.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SAR-CDCFRN: A novel SAR despeckling approach utilizing correlated dual channel feature-based residual network","authors":"Anirban Saha, Arihant K.R., Suman Kumar Maji","doi":"10.1016/j.image.2025.117267","DOIUrl":"10.1016/j.image.2025.117267","url":null,"abstract":"<div><div>As a result of the increasing need for capturing and processing visual data of the Earth’s surface, Synthetic Aperture Radar (SAR) technology has been widely embraced by all space research organisations. The primary drawback in the acquired SAR visuals (images) is the presence of unwanted granular noise, called “speckle”, which poses a limitation to their processing and analysis. Therefore removing this unwanted speckle noise from the captured SAR visuals, a process known as despeckling, becomes an important task. This article introduces a new despeckling residual network named SAR-CDCFRN. This network simultaneously extracts speckle components from both the spatial and inverse spatial channels. The extracted features are then correlated by a dual-layer attention block and further processed to predict the distribution of speckle in the input noisy image. The predicted distribution, which is the residual noise, is then mapped with the input noisy SAR data to generate a despeckled output image. Experimental results confirm the superiority of the proposed despeckling model over other existing technologies in the literature.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117267"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143128020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M.A. Rahman , Salma Sultana Tunny , A.S.M. Kayes , Peng Cheng , Aminul Huq , M.S. Rana , Md. Rashidul Islam , Animesh Sarkar Tusher
{"title":"Approximation-based energy-efficient cyber-secured image classification framework","authors":"M.A. Rahman , Salma Sultana Tunny , A.S.M. Kayes , Peng Cheng , Aminul Huq , M.S. Rana , Md. Rashidul Islam , Animesh Sarkar Tusher","doi":"10.1016/j.image.2025.117261","DOIUrl":"10.1016/j.image.2025.117261","url":null,"abstract":"<div><div>In this work, an energy-efficient cyber-secured framework for deep learning-based image classification is proposed. This simultaneously addresses two major concerns in relevant applications, which are typically handled separately in the existing works. An image approximation-based data storage scheme to improve the efficiency of memory usage while reducing energy consumption at both the source and user ends is discussed. Also, the proposed framework mitigates the impacts of two different adversarial attacks, notably retaining performance. The experimental analysis signifies the academic and industrial importance of this work as it demonstrates reductions of 62.5% in energy consumption for image classification when accessing memory and in the effective memory sizes of both ends by the same amount. During the improvement of memory efficiency, the multi-scale structural similarity index measure (MS-SSIM) is found to be the optimum image quality assessment method among different similarity-based metrics for the image classification task with approximated images and an average image quality of 0.9449 in terms of MS-SSIM is maintained. Also, a comparative analysis of three different classifiers with different depths indicates that the proposed scheme maintains up to 90.17% of original classification accuracy under normal and cyber-attack scenarios, effectively defending against untargeted and targeted white-box adversarial attacks with varying parameters.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117261"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143127943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mireille El-Assal, Pierre Tirilly, Ioan Marius Bilasco
{"title":"Spiking two-stream methods with unsupervised STDP-based learning for action recognition","authors":"Mireille El-Assal, Pierre Tirilly, Ioan Marius Bilasco","doi":"10.1016/j.image.2025.117263","DOIUrl":"10.1016/j.image.2025.117263","url":null,"abstract":"<div><div>Video analysis is a computer vision task that is useful for many applications like surveillance, human-machine interaction, and autonomous vehicles. Deep learning methods are currently the state-of-the-art methods for video analysis. Particularly, two-stream methods, which leverage both spatial and temporal information, have proven to be valuable in Human Action Recognition (HAR). However, they have high computational costs, and need a large amount of labeled data for training. In addressing these challenges, this paper adopts a more efficient approach by leveraging Convolutional Spiking Neural Networks (CSNNs) trained with the unsupervised Spike Timing-Dependent Plasticity (STDP) learning rule for action classification. These networks represent the information using asynchronous low-energy spikes, which allows the network to be more energy efficient when implemented on neuromorphic hardware. Furthermore, learning visual features with unsupervised learning reduces the need for labeled data during training, making the approach doubly advantageous. Therefore, we explore transposing two-stream convolutional neural networks into the spiking domain, where we train each stream with the unsupervised STDP learning rule. We investigate the performance of these networks in video analysis by employing five distinct configurations for the temporal stream, and evaluate them across four benchmark HAR datasets. In this work, we show that two-stream CSNNs can successfully extract spatio-temporal information from videos despite using limited training data, and that the spiking spatial and temporal streams are complementary. We also show that replacing a dedicated temporal stream with a spatio-temporal one within a spiking two-stream architecture leads to information redundancy that hinders the performance.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117263"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengyuan Huang , Kan Chang , Qingpao Qin , Yahui Tang , Guiqing Li
{"title":"Conditional Laplacian pyramid networks for exposure correction","authors":"Mengyuan Huang , Kan Chang , Qingpao Qin , Yahui Tang , Guiqing Li","doi":"10.1016/j.image.2025.117276","DOIUrl":"10.1016/j.image.2025.117276","url":null,"abstract":"<div><div>Improper exposures greatly degenerate the visual quality of images. Correcting various exposure errors in a unified framework is challenging as it requires simultaneously handling global attributes and local details under different exposure conditions. In this paper, we propose a conditional Laplacian pyramid network (CLPN) for correcting different exposure errors in the same framework. It applies Laplacian pyramid to decompose an improperly exposed image into a low-frequency (LF) component and several high-frequency (HF) components, and then enhances the decomposed components in a coarse-to-fine manner. To consistently correct a wide range of exposure errors, a conditional feature extractor is designed to extract the conditional feature from the given image. Afterwards, the conditional feature is used to guide the refinement of LF features, so that a precisely correction for illumination, contrast and color tone can be obtained. As different frequency components exhibit pixel-wise correlations, the frequency components in lower pyramid layers are used to support the reconstruction of the HF components in higher layers. By doing so, fine details can be effectively restored, while noises can be well suppressed. Extensive experiments show that our method is more effective than state-of-the-art methods on correcting various exposure conditions ranging from severe underexposure to intense overexposure.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"134 ","pages":"Article 117276"},"PeriodicalIF":3.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143171430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ATM-DEN: Image Inpainting via attention transfer module and Decoder-Encoder network","authors":"Siwei Zhang , Yuantao Chen","doi":"10.1016/j.image.2025.117268","DOIUrl":"10.1016/j.image.2025.117268","url":null,"abstract":"<div><div>The current prevailing techniques for image restoration predominantly employ self-encoding and decoding networks, aiming to reconstruct the original image during the decoding phase utilizing the compressed data captured during encoding. Nevertheless, the self-encoding network inherently suffers from information loss during compression, rendering it challenging to achieve nuanced restoration outcomes solely reliant on compressed information, particularly manifesting as blurred imagery and distinct edge artifacts around the restored areas. To mitigate this issue of insufficient image information utilization, we introduce a Multi-Stage Decoding Network in this study. This network leverages multiple decoders to decode and integrate features from each layer of the encoding stage, thereby enhancing the exploitation of encoder features across various scales. Subsequently, a feature mapping is derived that more accurately captures the content of the impaired region. Comparative experiments conducted on globally recognized datasets demonstrate that MSDN achieves a notable enhancement in the visual quality of restored images.</div></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"133 ","pages":"Article 117268"},"PeriodicalIF":3.4,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143127942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}