{"title":"Learning content-aware feature fusion for guided depth map super-resolution","authors":"Yifan Zuo , Hao Wang , Yaping Xu , Huimin Huang , Xiaoshui Huang , Xue Xia , Yuming Fang","doi":"10.1016/j.image.2024.117140","DOIUrl":"https://doi.org/10.1016/j.image.2024.117140","url":null,"abstract":"<div><p>RGB-D data including paired RGB color images and depth maps is widely used in downstream computer vision tasks. However, compared with the acquisition of high-resolution color images, the depth maps captured by consumer-level sensors are always in low resolution. Within decades of research, the most state-of-the-art (SOTA) methods of depth map super-resolution cannot adaptively tune the guidance fusion for all feature positions by channel-wise feature concatenation with spatially sharing convolutional kernels. This paper proposes JTFNet to resolve this issue, which simulates the traditional Joint Trilateral Filter (JTF). Specifically, a novel JTF block is introduced to adaptively tune the fusion pattern between the color features and the depth features for all feature positions. Moreover, based on the variant of JTF block whose target features and guidance features are in the cross-scale shape, the fusion for depth features is performed in a bi-directional way. Therefore, the error accumulation along scales can be effectively mitigated by iteratively HR feature guidance. Compared with the SOTA methods, the sufficient experiment is conducted on the mainstream synthetic datasets and real datasets, <em>i.e.,</em> Middlebury, NYU and ToF-Mark, which shows remarkable improvement of our JTFNet.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117140"},"PeriodicalIF":3.5,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140914295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eun Su Kang, Yeon Jeong Chae, Jae Hyeon Park, Sung In Cho
{"title":"DCID: A divide and conquer approach to solving the trade-off problem between artifacts caused by enhancement procedure in image downscaling","authors":"Eun Su Kang, Yeon Jeong Chae, Jae Hyeon Park, Sung In Cho","doi":"10.1016/j.image.2024.117133","DOIUrl":"https://doi.org/10.1016/j.image.2024.117133","url":null,"abstract":"<div><p>Conventional research on image downscaling is conducted to improve the visual quality of the resultant downscaled image. However, there is an intractable problem, a trade-off relationship between artifacts such as aliasing and ringing, caused by enhancement procedure in image downscaling. To solve this problem, we propose a novel method that applies a divide-and-conquer approach for image downscaling (DCID). Specifically, the proposed DCID includes Weight-Net for dividing regions into enhancement first and artifact-least first regions and two generators that are optimized for divided regions to conquer the trade-off problem in the image downscaling task. The proposed method can generate a downscaled image without creating artifacts while preserving the perceptual quality of the input image. In objective and subjective evaluations, our experimental results show that the quality of the downscaled images generated by the proposed DCID is significantly better than benchmark methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117133"},"PeriodicalIF":3.5,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140951313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep neural network based distortion parameter estimation for blind quality measurement of stereoscopic images","authors":"Yi Zhang , Damon M. Chandler , Xuanqin Mou","doi":"10.1016/j.image.2024.117138","DOIUrl":"https://doi.org/10.1016/j.image.2024.117138","url":null,"abstract":"<div><p>Stereoscopic/3D image quality measurement (SIQM) has emerged as an active and important research branch in image processing/computer vision field. Existing methods for blind/no-reference SIQM often train machine-learning models on degraded stereoscopic images for which human subjective quality ratings have been obtained, and they are thus constrained by the fact that only a limited number of 3D image quality datasets currently exist. Although methods have been proposed to overcome this restriction by predicting distortion parameters rather than quality scores, the approach is still limited to the time-consuming, hand-crafted features extracted to train the corresponding classification/regression models as well as the rather complicated binocular fusion/rivalry models used to predict the cyclopean view. In this paper, we explore the use of deep learning to predict distortion parameters, giving rise to a more efficient opinion-unaware SIQM technique. Specifically, a deep fusion-and-excitation network which takes into account the multiple-distortion interactions is proposed to perform distortion parameter estimation, thus avoiding hand-crafted features by using convolution layers while simultaneously accelerating the algorithm by using the GPU. Moreover, we measure distortion parameter values of the cyclopean view by using support vector regression models which are trained on the data obtained from a newly-designed subjective test. In this way, the potential errors in computing the disparity map and cyclopean view can be prevented, leading to a more rapid and precise 3D-vision distortion parameter estimation. Experimental results tested on various 3D image quality datasets demonstrate that our proposed method, in most cases, offers improved predictive performance over existing state-of-the-art methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117138"},"PeriodicalIF":3.5,"publicationDate":"2024-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CVEGAN: A perceptually-inspired GAN for Compressed Video Enhancement","authors":"Di Ma, Fan Zhang, David R. Bull","doi":"10.1016/j.image.2024.117127","DOIUrl":"https://doi.org/10.1016/j.image.2024.117127","url":null,"abstract":"<div><p>We propose a new Generative Adversarial Network for Compressed Video frame quality Enhancement (CVEGAN). The CVEGAN generator benefits from the use of a novel Mul<sup>2</sup>Res block (with multiple levels of residual learning branches), an enhanced residual non-local block (ERNB) and an enhanced convolutional block attention module (ECBAM). The ERNB has also been employed in the discriminator to improve the representational capability. The training strategy has also been re-designed specifically for video compression applications, to employ a relativistic sphere GAN (ReSphereGAN) training methodology together with new perceptual loss functions. The proposed network has been fully evaluated in the context of two typical video compression enhancement tools: post-processing (PP) and spatial resolution adaptation (SRA). CVEGAN has been fully integrated into the MPEG HEVC and VVC video coding test models (HM 16.20 and VTM 7.0) and experimental results demonstrate significant coding gains (up to 28% for PP and 38% for SRA compared to the anchor) over existing state-of-the-art architectures for both coding tools across multiple datasets based on the HM 16.20. The respective gains for VTM 7.0 are up to 8.0% for PP and up to 20.3% for SRA.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"127 ","pages":"Article 117127"},"PeriodicalIF":3.5,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0923596524000286/pdfft?md5=3b459f9525f84784af198f2f1adf008e&pid=1-s2.0-S0923596524000286-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141323046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mengjie Li , Yujie Huang , Mingyu Wang , Wenhong Li , Xiaoyang Zeng
{"title":"STCC-Filter: A space-time-content correlation-based noise filter with self-adjusting threshold for event camera","authors":"Mengjie Li , Yujie Huang , Mingyu Wang , Wenhong Li , Xiaoyang Zeng","doi":"10.1016/j.image.2024.117136","DOIUrl":"https://doi.org/10.1016/j.image.2024.117136","url":null,"abstract":"<div><p>Bio-inspired event cameras have become a new paradigm of image sensors detecting illumination changes asynchronously and independently for each pixel. However, their sensitivity to noise degrades the output quality. Most existing denoising methods based on spatiotemporal correlation deteriorate in low light conditions due to frequently bursting noise. To tackle this challenge and remove noise for neuromorphic cameras, this paper proposes space–time-content correlation (STCC) and a novel noise filter with self-adjusted threshold, STCC-Filter. In the proposed denoising algorithm, content correlation is modeled based on the brightness change patterns caused by moving objects. Furthermore, space–time and content support from a sequence of events within the range specified by the threshold which can be programmed based on the real application scenarios are fully utilized to improve the robustness and performance of denoising. STCC-Filter is evaluated on widely used datasets and our labeled synthesized datasets. The experimental results demonstrate that the proposed method outperforms traditional spatiotemporal-correlation-based methods in removing more noise and preserving more signals.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117136"},"PeriodicalIF":3.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140894883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Universal deep demosaicking for sparse color filter arrays","authors":"Chenyan Bai , Wenxing Qiao , Jia Li","doi":"10.1016/j.image.2024.117135","DOIUrl":"https://doi.org/10.1016/j.image.2024.117135","url":null,"abstract":"<div><p>Sparse color filter array (CFA) is a potential alternative for the commonly used Bayer CFA, which uses only red (R), green (G), and blue (B) pixels. In sparse CFAs, most pixels are panchromatic (white) ones and only a small percentage of pixels are RGB pixels. Sparse CFAs have the motivation of human visual system and superior low-light photography performance. However, most of the associated demosaicking methods highly depend on synthetic images and are limited to a few specific CFAs. In this paper, we propose a universal demosaicking method for sparse CFAs. Our method has two sequential steps: W-channel recovery and RGB-channel reconstruction. More specifically, it first uses the W channel inpainting network (WCI-Net) to recover the W channel. The first layer of WCI-Net performs the scatter-weighted interpolation, which enables the network to work with various CFAs. Then it employs the differentiable guided filter to reconstruct the RGB channels with the reference of recovered W channel. The differentiable guided filter introduces a binary mask to specify the positions of RGB pixels. So it can handle arbitrary sparse CFAs. Also, it can be trained end-to-end and hence could obtain superior performance but do not overfit the synthetic images. Experiments on clean and noisy images confirm the advantage of the proposed demosaicking method.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117135"},"PeriodicalIF":3.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140825435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contrastive learning for deep tone mapping operator","authors":"Di Li , Mou Wang , Susanto Rahardja","doi":"10.1016/j.image.2024.117130","DOIUrl":"https://doi.org/10.1016/j.image.2024.117130","url":null,"abstract":"<div><p>Most existing tone mapping operators (TMOs) are developed based on prior assumptions of human visual system, and they are known to be sensitive to hyperparameters. In this paper, we proposed a straightforward yet efficient framework to automatically learn the priors and perform tone mapping in an end-to-end manner. The proposed algorithm utilizes a contrastive learning framework to enforce the content consistency between high dynamic range (HDR) inputs and low dynamic range (LDR) outputs. Since contrastive learning aims at maximizing the mutual information across different domains, no paired images or labels are required in our algorithm. Equipped with an attention-based U-Net to alleviate the aliasing and halo artifacts, our algorithm can produce sharp and visually appealing images over various complex real-world scenes, indicating that the proposed algorithm can be used as a strong baseline for future HDR image tone mapping task. Extensive experiments as well as subjective evaluations demonstrated that the proposed algorithm outperforms the existing state-of-the-art algorithms qualitatively and quantitatively. The code is available at <span>https://github.com/xslidi/CATMO</span><svg><path></path></svg>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117130"},"PeriodicalIF":3.5,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140894882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"U-ATSS: A lightweight and accurate one-stage underwater object detection network","authors":"Junjun Wu, Jinpeng Chen, Qinghua Lu, Jiaxi Li, Ningwei Qin, Kaixuan Chen, Xilin Liu","doi":"10.1016/j.image.2024.117137","DOIUrl":"https://doi.org/10.1016/j.image.2024.117137","url":null,"abstract":"<div><p>Due to the harsh and unknown marine environment and the limited diving ability of human beings, underwater robots become an important role in ocean exploration and development. However, the performance of underwater robots is limited by blurred images, low contrast and color deviation, which are resulted from complex underwater imaging environments. The existing mainstream object detection networks perform poorly when applied directly to underwater tasks. Although using a cascaded detector network can get high accuracy, the inference speed is too slow to apply to actual tasks. To address the above problems, this paper proposes a lightweight and accurate one-stage underwater object detection network, called U-ATSS. Firstly, we compressed the backbone of ATSS to significantly reduce the number of network parameters and improve the inference speed without losing the detection accuracy, to achieve lightweight and real-time performance of the underwater object detection network. Then, we propose a plug-and-play receptive field module F-ASPP, which can obtain larger receptive fields and richer spatial information, and optimize the learning rate strategy as well as classification loss function to significantly improve the detection accuracy and convergence speed. We evaluated and compared U-ATSS with other methods on the Kesci Underwater Object Detection Algorithm Competition dataset containing a variety of marine organisms. The experimental results show that U-ATSS not only has obvious lightweight characteristics, but also shows excellent performance and competitiveness in terms of detection accuracy.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"126 ","pages":"Article 117137"},"PeriodicalIF":3.5,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140906614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Image splicing detection using low-dimensional feature vector of texture features and Haralick features based on Gray Level Co-occurrence Matrix","authors":"Debjit Das, Ruchira Naskar","doi":"10.1016/j.image.2024.117134","DOIUrl":"https://doi.org/10.1016/j.image.2024.117134","url":null,"abstract":"<div><p><em>Digital image forgery</em> has become hugely widespread, as numerous easy-to-use, low-cost image manipulation tools have become widely available to the common masses. Such forged images can be used with various malicious intentions, such as to harm the social reputation of renowned personalities, to perform identity fraud resulting in financial disasters, and many more illegitimate activities. <em>Image splicing</em> is a form of image forgery where an adversary intelligently combines portions from multiple source images to generate a natural-looking artificial image. Detection of image splicing attacks poses an open challenge in the forensic domain, and in recent literature, several significant research findings on image splicing detection have been described. However, the number of features documented in such works is significantly huge. Our aim in this work is to address the issue of feature set optimization while modeling image splicing detection as a classification problem and preserving the forgery detection efficiency reported in the state-of-the-art. This paper proposes an image-splicing detection scheme based on textural features and Haralick features computed from the input image’s Gray Level Co-occurrence Matrix (GLCM) and also localizes the spliced regions in a detected spliced image. We have explored the well-known Columbia Image Splicing Detection Evaluation Dataset and the DSO-1 dataset, which is more challenging because of its constituent post-processed color images. Experimental results prove that our proposed model obtains 95% accuracy in image splicing detection with an AUC score of 0.99, with an optimized feature set of dimensionality of 15 only.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"125 ","pages":"Article 117134"},"PeriodicalIF":3.5,"publicationDate":"2024-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140816438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qianyu Wu , Zhongqian Hu , Aichun Zhu , Hui Tang , Jiaxin Zou , Yan Xi , Yang Chen
{"title":"A flow-based multi-scale learning network for single image stochastic super-resolution","authors":"Qianyu Wu , Zhongqian Hu , Aichun Zhu , Hui Tang , Jiaxin Zou , Yan Xi , Yang Chen","doi":"10.1016/j.image.2024.117132","DOIUrl":"10.1016/j.image.2024.117132","url":null,"abstract":"<div><p>Single image super-resolution (SISR) is still an important while challenging task. Existing methods usually ignore the diversity of generated Super-Resolution (SR) images. The fine details of the corresponding high-resolution (HR) images cannot be confidently recovered due to the degradation of detail in low-resolution (LR) images. To address the above issue, this paper presents a flow-based multi-scale learning network (FMLnet) to explore the diverse mapping spaces for SR. First, we propose a multi-scale learning block (MLB) to extract the underlying features of the LR image. Second, the introduced pixel-wise multi-head attention allows our model to map multiple representation subspaces simultaneously. Third, by employing a normalizing flow module for a given LR input, our approach generates various stochastic SR outputs with high visual quality. The trade-off between fidelity and perceptual quality can be controlled. Finally, the experimental results on five datasets demonstrate that the proposed network outperforms the existing methods in terms of diversity, and achieves competitive PSNR/SSIM results. Code is available at <span>https://github.com/qianyuwu/FMLnet</span><svg><path></path></svg>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"125 ","pages":"Article 117132"},"PeriodicalIF":3.5,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140760491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}