Li-Heng Chen , Christos G. Bampis , Zhi Li , Joel Sole , Chao Chen , Alan C. Bovik
{"title":"Learned fractional downsampling network for adaptive video streaming","authors":"Li-Heng Chen , Christos G. Bampis , Zhi Li , Joel Sole , Chao Chen , Alan C. Bovik","doi":"10.1016/j.image.2024.117172","DOIUrl":"10.1016/j.image.2024.117172","url":null,"abstract":"<div><p>Given increasing demand for very large format contents and displays, spatial resolution changes have become an important part of video streaming. In particular, video downscaling is a key ingredient that streaming providers implement in their encoding pipeline as part of video quality optimization workflows. Here, we propose a downsampling network architecture that progressively reconstructs residuals at different scales. Since the layers of convolutional neural networks (CNNs) can only be used to alter the resolutions of their inputs by integer scale factors, we seek new ways to achieve fractional scaling, which is crucial in many video processing applications. More concretely, we utilize an alternative building block, formulated as a conventional convolutional layer followed by a differentiable resizer. To validate the efficacy of our proposed downsampling network, we integrated it into a modern video encoding system for adaptive streaming. We extensively evaluated our method using a variety of different video codecs and upsampling algorithms to show its generality. The experimental results show that improvements in coding efficiency over the conventional Lanczos algorithm and state-of-the-art methods are attained, in terms of PSNR, SSIM, and VMAF, when tested on high-resolution test videos. In addition to quantitative experiments, we also carried out a subjective quality study, validating that the proposed downsampling model yields favorable results.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117172"},"PeriodicalIF":3.4,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A modified hue and range preserving color assignment function with a component-wise saturation adjustment for color image enhancement","authors":"Sepideh Khormaeipour, Fatemeh Shakeri","doi":"10.1016/j.image.2024.117174","DOIUrl":"10.1016/j.image.2024.117174","url":null,"abstract":"<div><p>This paper presents a new approach to enhancing color images by modifying an affine color assignment function. This function maps colors to pixels of the enhanced gray-scale image in a way that improves the visual quality of the image, particularly in darker regions. The main goal of our method is to finely adjust saturation, correct saturation loss in specific image regions, and preserve the original image’s range and hue. Our proposed method follows a two-step process. First, it enhances the intensity image using a combination of global and local histogram equalization methods. This results in an overall improved appearance by redistributing pixel intensities and enhancing contrast. Then, modified color mapping functions are applied to assign colors to each pixel of the enhanced gray-scale image. The aim is to adjust saturation by amplifying the maximally saturated color image. Additionally, we introduce two new color-weighted maps to evaluate pixel importance from the maximally saturated image. This contributes to saturation control in the final enhanced image. Compared to alternative color mapping algorithms, our model preserves the original color of pixels in challenging areas and fine-tunes saturation based on parameter settings.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117174"},"PeriodicalIF":3.4,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141851907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoxuan Chen , Shuwen Xu , Shaohai Hu , Xiaole Ma
{"title":"MGFA : A multi-scale global feature autoencoder to fuse infrared and visible images","authors":"Xiaoxuan Chen , Shuwen Xu , Shaohai Hu , Xiaole Ma","doi":"10.1016/j.image.2024.117168","DOIUrl":"10.1016/j.image.2024.117168","url":null,"abstract":"<div><p>Since the convolutional operation pays too much attention to local information, resulting in the loss of global information and a decline in fusion quality. In order to ensure that the fused image fully captures the features of the entire scene, an end-to-end Multi-scale Global Feature Autoencoder (MGFA) is proposed in this paper, which can generate fused images with both global and local information. In this network, a multi-scale global feature extraction module is proposed, which combines dilated convolutional modules with the Global Context Block (GCBlock) to extract the global features ignored by the convolutional operation. In addition, an adaptive embedded residual fusion module is proposed to fuse different frequency components in the source images with the idea of embedded residual learning. This can enrich the detailed texture of the fused results. Extensive qualitative and quantitative experiments have demonstrated that the proposed method can achieve excellent results in retaining global information and improving visual effects. Furthermore, the fused images obtained in this paper are more adapted to the object detection task and can assist in improving the precision of detection.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117168"},"PeriodicalIF":3.4,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141637429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"USteg-DSE: Universal quantitative Steganalysis framework using Densenet merged with Squeeze & Excitation net","authors":"Anuradha Singhal, Punam Bedi","doi":"10.1016/j.image.2024.117171","DOIUrl":"10.1016/j.image.2024.117171","url":null,"abstract":"<div><p>Carrying concealed communication via media is termed as steganography and unraveling details of such covert transmission is known as steganalysis. Extracting details of hidden message like length, position, embedding algorithm etc. forms part of forensic steganalysis. Predicting length of payload in camouflaged interchange is termed as quantitative steganalysis and is an indispensable tool for forensic investigators. When payload length is estimated without any prior knowledge about cover media or used steganography algorithm, it is termed as universal quantitative steganalysis.</p><p>Most of existing frameworks on quantitative steganalysis available in literature, work for a specific embedding algorithm or are domain specific. In this paper we propose and present USteg-DSE, a deep learning framework for performing universal quantitative image steganalysis using DenseNet with Squeeze & Excitation module (SEM). In deep learning techniques, deeper networks easily capture complex statistical properties. But as depth increases, networks suffer from vanishing gradient problem. In classic architectures, all channels are equally weighted to produce feature maps. Presented USteg-DSE framework overcomes these problems by using DenseNet and SEM. In DenseNet, each layer is directly connected with every other layer. DenseNet makes information and gradient flow easier with fewer feature maps. SEM incorporates content aware mechanism to adaptively regulate weight for every feature map. Presented framework has been compared with existing state-of-the-art techniques for spatial domain as well as transform domain and show better results in terms of Mean Absolute Error (MAE) and Mean Square Error (MSE).</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117171"},"PeriodicalIF":3.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141705212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning sparse feature representation for blind quality assessment of night-time images","authors":"Maryam Karimi , Mansour Nejati","doi":"10.1016/j.image.2024.117167","DOIUrl":"10.1016/j.image.2024.117167","url":null,"abstract":"<div><p>Capturing Night-Time Images (NTIs) with high-quality is quite challenging for consumer photography and several practical applications. Thus, addressing the quality assessment of night-time images is urgently needed. Since there is no available reference image for such images, Night-Time image Quality Assessment (NTQA) should be done blindly. Although Blind natural Image Quality Assessment (BIQA) has attracted a great deal of attention for a long time, very little work has been done in the field of NTQA. Due to the capturing conditions, NTIs suffer from various complex authentic distortions that make it a challenging field of research. Therefore, previous BIQA methods, do not provide sufficient correlation with subjective scores in the case of NTIs and special methods of NTQA should be developed. In this paper we conduct an unsupervised feature learning method for blind quality assessment of night-time images. The features are the sparse representation over the data-adaptive dictionaries learned on the image exposure and gradient magnitude maps. Having these features, an ensemble regression model trained using least squares gradient boosting scheme predicts high correlated objective scores on the standard datasets.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117167"},"PeriodicalIF":3.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingdi Hu , Jingbing Yang , Jianxun Yu , Bingyi Jing
{"title":"Prior-DualGAN: Rain rendering from coarse to fine","authors":"Mingdi Hu , Jingbing Yang , Jianxun Yu , Bingyi Jing","doi":"10.1016/j.image.2024.117170","DOIUrl":"10.1016/j.image.2024.117170","url":null,"abstract":"<div><p>The success of deep neural networks (<em>DNN</em>) in deraining has led to increased research in rain rendering. In this paper, we introduce a novel <em>Prior-DualGAN</em> algorithm to synthesize diverse and realistic rainy/non-rainy image pairs to improve <em>DNN</em> training for deraining. More precisely, the rain streak prior is first generated using essential rain streak attributes; then more realistic and diverse rain streak patterns are rendered by the first generator; finally, the second generator naturally fuses the background and generated rain streaks to produce the final rainy images. Our method has two main advantages: (1) the rain streak prior enables the network to incorporate physical prior knowledge, accelerating network convergence; (2) our dual <em>GAN</em> approach gradually improves the naturalness and diversity of synthesized rainy images from rain streak synthesis to rainy image synthesis. We evaluate existing deraining algorithms using our generated rain-augmented datasets <em>Rain100L</em>, <em>Rain14000</em>, and <em>Rain-Vehicle</em>, verifying that training with our generated rain-augmented datasets significantly improves the deraining effect. The source code will be released shortly after article’s acceptance.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"129 ","pages":"Article 117170"},"PeriodicalIF":3.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141704976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LMNet: A learnable multi-scale cost volume for stereo matching","authors":"Jiatao Liu , Yaping Zhang","doi":"10.1016/j.image.2024.117169","DOIUrl":"10.1016/j.image.2024.117169","url":null,"abstract":"<div><p>Calculating disparities through stereo matching is an important step in a variety of machine vision tasks used for robotics and similar applications. The use of deep neural networks for stereo matching requires the construction of a matching cost volume. However, the occluded, non-textured, and reflective regions are ill-posed, which cannot be directly matched. In previous studies, a direct calculation has typically been used to measure matching costs for single-scale feature maps, which makes it difficult to predict disparity for ill-posed regions. Thus, we propose a learnable multi-scale matching cost calculation method (LMNet) to improve the accuracy of stereo matching. This learned matching cost can reasonably estimate the disparity of the regions that are conventionally difficult to match. Multi-level 3D dilation convolutions for multi-scale features are introduced during constructing cost volumes because the receptive field of the convolution kernels is limited. The experimental results show that the proposed method achieves significant improvement in ill-posed regions. Compared with the classical architecture GwcNet, End-Point-Error (EPE) of the proposed method on the Scene Flow dataset is reduced by 16.46%. The number of parameters and required calculations are also reduced by 8.71% and 20.05%, respectively. The proposed model code and pre-training parameters are available at: <span><span>https://github.com/jt-liu/LMNet</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117169"},"PeriodicalIF":3.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141698516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Xu, Lulu Pan, Guohua Peng, Wenbo Zhang, Yanheng Lv, Guo Li, Lingxiao Li, Le Lei
{"title":"Multi-scale strip-shaped convolution attention network for lightweight image super-resolution","authors":"Ke Xu, Lulu Pan, Guohua Peng, Wenbo Zhang, Yanheng Lv, Guo Li, Lingxiao Li, Le Lei","doi":"10.1016/j.image.2024.117166","DOIUrl":"10.1016/j.image.2024.117166","url":null,"abstract":"<div><p>Lightweight convolutional neural networks for Single Image Super-Resolution (SISR) have exhibited remarkable performance improvements in recent years. These models achieve excellent performance by relying on attention mechanisms that incorporate square-shaped convolutions to enhance feature representation. However, these approaches still suffer from redundancy which comes from square-shaped convolutional kernels and overlooks the utilization of multi-scale information. In this paper, we propose a novel attention mechanism called Multi-scale Strip-shaped convolution Attention (MSA), which utilizes three sets of differently sized depth-wise separable stripe convolution kernels in parallel to replace the redundant square-shaped convolution attention and extract multi-scale features. We also generalize MSA to other lightweight neural network models, and experimental results show that MSA outperforms other convolutional based attention mechanisms. Building upon MSA, we propose an Efficient Feature Extraction Block (EFEB), a lightweight block for SISR. Finally, based on EFEB, we propose a lightweight image super-resolution neural network named Multi-scale Strip-shaped convolution Attention Network (MSAN). Experiments demonstrate that MSAN outperforms existing state-of-the-art lightweight SR methods with fewer parameters and lower computational complexity.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117166"},"PeriodicalIF":3.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141704349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Zheng , Boyang Wang , Deyang Liu , Chengtao Lv , Jiebin Yan , Ping An
{"title":"A foreground-context dual-guided network for light-field salient object detection","authors":"Xin Zheng , Boyang Wang , Deyang Liu , Chengtao Lv , Jiebin Yan , Ping An","doi":"10.1016/j.image.2024.117165","DOIUrl":"https://doi.org/10.1016/j.image.2024.117165","url":null,"abstract":"<div><p>Light-field salient object detection (SOD) has become an emerging trend as it records comprehensive information about natural scenes that can benefit salient object detection in various ways. However, salient object detection models with light-field data as input have not been thoroughly explored. The existing methods cannot effectively suppress the noise, and it is difficult to distinguish the foreground and background under challenging conditions including self-similarity, complex backgrounds, large depth of field, and non-Lambertian scenarios. In order to extract the feature of light-field images effectively and suppress the noise in light-field, in this paper, we propose a foreground and context dual guided network. Specifically, we design a global context extraction module (GCEM) and a local foreground extraction module (LFEM). GCEM is used to suppress global noise and roughly predict saliency maps. GCEM also can extract global context information from deep-level features to guide decoding process. By extracting local information from shallow-level, LFEM refines the prediction obtained by GCEM. In addition, we use RGB images to enhance the light-field images before the input GCEM. Experimental results show that our proposed method is effective in suppressing global noise and achieves better results when dealing with transparent objects and complex backgrounds. The experimental results show that the proposed method outperforms several other state-of-the-art methods on three light-field datasets.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117165"},"PeriodicalIF":3.4,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141540522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wujie Zhou , Gao Xu , Meixin Fang , Shanshan Mao , Rongwang Yang , Lu Yu
{"title":"PGGNet: Pyramid gradual-guidance network for RGB-D indoor scene semantic segmentation","authors":"Wujie Zhou , Gao Xu , Meixin Fang , Shanshan Mao , Rongwang Yang , Lu Yu","doi":"10.1016/j.image.2024.117164","DOIUrl":"https://doi.org/10.1016/j.image.2024.117164","url":null,"abstract":"<div><p>In RGB-D (red–green–blue and depth) scene semantic segmentation, depth maps provide rich spatial information to RGB images to achieve high performance. However, properly aggregating depth information and reducing noise and information loss during feature encoding after fusion are challenging aspects in scene semantic segmentation. To overcome these problems, we propose a pyramid gradual-guidance network for RGB-D indoor scene semantic segmentation. First, the quality of depth information is improved by a modality-enhancement fusion module and RGB image fusion. Then, the representation of semantic information is improved by multiscale operations. The two resulting adjacent features are used in a feature refinement module with an attention mechanism to extract semantic information. The features from adjacent modules are successively used to form an encoding pyramid, which can substantially reduce information loss and thereby ensure information integrity. Finally, we gradually integrate features at the same scale obtained from the encoding pyramid during decoding to obtain high-quality semantic segmentation. Experimental results obtained from two commonly used indoor scene datasets demonstrate that the proposed pyramid gradual-guidance network attains the highest level of performance in semantic segmentation, as compared to other existing methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"128 ","pages":"Article 117164"},"PeriodicalIF":3.4,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141479791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}