Fátima Belén Paiva Pavón , María Cristina Orué Gil , José Luis Vázquez Noguera , Helena Gómez-Adorno , Valentín Calzada-Ledesma
{"title":"RGB pixel n-grams: A texture descriptor","authors":"Fátima Belén Paiva Pavón , María Cristina Orué Gil , José Luis Vázquez Noguera , Helena Gómez-Adorno , Valentín Calzada-Ledesma","doi":"10.1016/j.image.2023.117028","DOIUrl":"https://doi.org/10.1016/j.image.2023.117028","url":null,"abstract":"<div><p>This article proposes the “RGB Pixel N-grams” descriptor, which uses a sequence of <span><math><mi>n</mi></math></span><span> pixels to represent RGB color texture images. We conducted classification experiments with three different classifiers and five color texture image databases to evaluate the descriptor’s performance, using accuracy as the evaluation metric<span>. These databases include various textures from different surfaces, sometimes under different lighting, scale, or rotation conditions. The proposed descriptor proved to be robust and competitive compared to other state-of-the-art descriptors, as it has better accuracy in classification results in most databases and classifiers.</span></span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117028"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual attention guided multi-scale fusion network for RGB-D salient object detection","authors":"Huan Gao, Jichang Guo, Yudong Wang, Jianan Dong","doi":"10.1016/j.image.2023.117004","DOIUrl":"https://doi.org/10.1016/j.image.2023.117004","url":null,"abstract":"<div><p>While recent research on salient object detection (SOD) has shown remarkable progress in leveraging both RGB and depth data, it is still worth exploring how to use the inherent relationship between the two to extract and fuse features more effectively, and further make more accurate predictions. In this paper, we consider combining the attention mechanism with the characteristics of the SOD, proposing the Dual Attention Guided Multi-scale Fusion Network. We design the multi-scale fusion block by combining multi-scale branches with channel attention to achieve better fusion of RGB and depth information. Using the characteristic of the SOD, the dual attention module is proposed to make the network pay more attention to the currently unpredicted saliency regions and the wrong parts in the already predicted regions. We perform an ablation study to verify the effectiveness of each component. Quantitative and qualitative experimental results demonstrate that our method achieves state-of-the-art (SOTA) performance.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117004"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wencheng Wang , Dongliang Yan , Xiaojin Wu , Weikai He , Zhenxue Chen , Xiaohui Yuan , Lun Li
{"title":"Low-light image enhancement based on virtual exposure","authors":"Wencheng Wang , Dongliang Yan , Xiaojin Wu , Weikai He , Zhenxue Chen , Xiaohui Yuan , Lun Li","doi":"10.1016/j.image.2023.117016","DOIUrl":"https://doi.org/10.1016/j.image.2023.117016","url":null,"abstract":"<div><p>Under poor illumination, the image information captured by a camera is partially lost, which seriously affects the visual perception of the human. Inspired by the idea that the fusion of multiexposure images can yield one high-quality image, an adaptive enhancement framework for a single low-light image is proposed based on the strategy of virtual exposure. In this framework, the exposure control parameters are adaptively generated through a statistical analysis of the low-light image, and a virtual exposure enhancer constructed by a quadratic function<span><span><span> is applied to generate several image frames from a single input image. Then, on the basis of generating weight maps by three factors, i.e., contrast, saturation and saliency, the image sequences and weight images are transformed by a Laplacian pyramid<span> and Gaussian pyramid, respectively, and multiscale fusion is implemented layer by layer. Finally, the enhanced result is obtained by pyramid reconstruction rule. Compared with the experimental results of several state-of-the-art methods on five datasets, the proposed method shows its superiority on several image quality evaluation metrics. This method requires neither image calibration nor </span></span>camera response function estimation and has a more flexible application range. It can weaken the possibility of overenhancement, effectively avoid the appearance of a halo in the enhancement results, and adaptively improve the visual </span>information fidelity.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117016"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49881553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Determination of Lagrange multipliers for interframe EZBC/JP2K","authors":"Yuan Liu , John W. Woods","doi":"10.1016/j.image.2023.117030","DOIUrl":"https://doi.org/10.1016/j.image.2023.117030","url":null,"abstract":"<div><p><span>Interframe<span> EZBC/JP2K has been shown to be an effective fine-grain scalable video coding system. However, its </span></span>Lagrange multiplier<span><span> values for motion estimation of multiple temporal levels are not specified, and must be specified by the user in the config file in order to run the program. In this paper, we investigate how to select these </span>Lagrange parameters for optimized performance. By designing an iterative mechanism, we make it possible for the encoder to adaptively select Lagrange multipliers based on the feedback of Y-PSNR closed GOP performance. Experimental results regarding both classic test video clips and their concatenations are obtained and discussed. We also present a new analytical model for optimized Lagrange multiplier selection in terms of target Y-PSNR.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117030"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep steerable pyramid wavelet network for unified JPEG compression artifact reduction","authors":"Yi Zhang , Damon M. Chandler , Xuanqin Mou","doi":"10.1016/j.image.2023.117011","DOIUrl":"https://doi.org/10.1016/j.image.2023.117011","url":null,"abstract":"<div><p><span>Although numerous methods have been proposed to remove blocking artifacts in JPEG-compressed images, one important issue not well addressed so far is the construction of a unified model that requires no prior knowledge of the JPEG encoding parameters to operate effectively on different compression-level images (grayscale/color) while occupying relatively small storage space to save and run. To address this issue, in this paper, we present a unified JPEG compression artifact<span> reduction model called DSPW-Net, which employs (1) the deep steerable pyramid wavelet transform network for Y-channel restoration, and (2) the classic U-Net architecture for CbCr-channel restoration. To enable our model to work effectively on images with a wide range of compression levels, the quality factor (QF) related features extracted by the </span></span>convolutional layers in the QF-estimation network are incorporated in the two restoration branches. Meanwhile, recursive blocks with shared parameters are utilized to drastically reduce model parameters and shared-source residual learning is employed to avoid the gradient vanishing/explosion problem in training. Extensive quantitative and qualitative results tested on various benchmark datasets demonstrate the effectiveness of our model as compared with other state-of-the-art deblocking methods.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117011"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Soccer line mark segmentation and classification with stochastic watershed transform","authors":"Daniel Berjón, Carlos Cuevas, Narciso García","doi":"10.1016/j.image.2023.117014","DOIUrl":"https://doi.org/10.1016/j.image.2023.117014","url":null,"abstract":"<div><p>Augmented reality applications are beginning to change the way sports are broadcast, providing richer experiences and valuable insights to fans. The first step of augmented reality systems is camera calibration, possibly based on detecting the line markings of the playing field. Most existing proposals for line detection rely on edge detection and Hough transform, but radial distortion and extraneous edges cause inaccurate or spurious detections of line markings. We propose a novel strategy to automatically and accurately segment and classify line markings. First, line points are segmented thanks to a stochastic watershed transform that is robust to radial distortions, since it makes no assumptions about line straightness, and is unaffected by the presence of players or the ball. The line points are then linked to primitive structures (straight lines and ellipses) thanks to a very efficient procedure that makes no assumptions about the number of primitives that appear in each image. The strategy has been tested on a new and public database composed by 60 annotated images from matches in five stadiums. The results obtained have proven that the proposed strategy is more robust and accurate than existing approaches, achieving successful line mark detection even under challenging conditions.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117014"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49845015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiao Wu , Mingyang Ma , Shuai Wan , Xiuxiu Han , Shaohui Mei
{"title":"Multi-scale deep feature fusion based sparse dictionary selection for video summarization","authors":"Xiao Wu , Mingyang Ma , Shuai Wan , Xiuxiu Han , Shaohui Mei","doi":"10.1016/j.image.2023.117006","DOIUrl":"https://doi.org/10.1016/j.image.2023.117006","url":null,"abstract":"<div><p>The explosive growth of video data constitutes a series of new challenges in computer vision<span><span>, and the function of video summarization (VS) is becoming more and more prominent. Recent works have shown the effectiveness of sparse dictionary selection (SDS) based VS, which selects a representative frame set to sufficiently reconstruct a given video. Existing SDS based VS methods use conventional handcrafted features or single-scale deep features, which could diminish their summarization performance due to the underutilization of frame feature representation. Deep learning<span> techniques based on convolutional neural networks<span> (CNNs) exhibit powerful capabilities among various vision tasks, as the CNN provides excellent feature representation. Therefore, in this paper, a multi-scale deep feature fusion<span> based sparse dictionary selection (MSDFF-SDS) is proposed for VS. Specifically, multi-scale features include the directly extracted features from the last fully connected layer and the global average pooling (GAP) processed features from intermediate layers, then VS is formulated as a problem of minimizing the reconstruction error using the multi-scale deep feature fusion. In our formulation, the contribution of each scale of features can be adjusted by a balance parameter, and the row-sparsity consistency of the simultaneous reconstruction coefficient is used to select as few </span></span></span></span>keyframes as possible. The resulting MSDFF-SDS model is solved by using an efficient greedy pursuit algorithm. Experimental results on two benchmark datasets demonstrate that the proposed MSDFF-SDS improves the F-score of keyframe based summarization more than 3% compared with the existing SDS methods, and performs better than most deep-learning methods for skimming based summarization.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117006"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49844963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongkui Wang , Li Yu , Hailang Yang , Haifeng Xu , Haibing Yin , Guangtao Zhai , Tianzong Li , Zhuo Kuang
{"title":"Surprise-based JND estimation for perceptual quantization in H.265/HEVC codecs","authors":"Hongkui Wang , Li Yu , Hailang Yang , Haifeng Xu , Haibing Yin , Guangtao Zhai , Tianzong Li , Zhuo Kuang","doi":"10.1016/j.image.2023.117019","DOIUrl":"https://doi.org/10.1016/j.image.2023.117019","url":null,"abstract":"<div><p><span>Just noticeable distortion (JND), reflecting the perceptual redundancy directly, has been widely used in image and video compression. However, the </span>human visual system<span><span> (HVS) is extremely complex and the visual signal processing has not been fully understood, which result in existing JND models are not accurate enough and the bitrate saving of JND-based perceptual compression schemes<span> is limited. This paper presents a novel pixel-based JND model for videos and a JND-based perceptual quantization scheme for HEVC codecs. In particular, positive and negative perception effects of the inter-frame difference and the motion information are analyzed and measured with an information-theoretic approach. Then, a surprise-based JND model is developed for perceptual video coding (PVC). In our PVC scheme, the frame-level perceptual quantization parameter (QP) is derived on the premise that the coding distortion is infinitely close to the estimated JND threshold. On the basis of the frame-level perceptual QP, we determine the perceptual QP for each coding unit through a perceptual adjustment function to achieve better </span></span>perceptual quality. Experimental results indicate that the proposed JND model outperforms existing models significantly, the proposed perceptual quantization scheme improves video compression efficiency with better perceptual quality and lower coding complexity.</span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117019"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49881551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint adjustment image steganography networks","authors":"Le Zhang , Yao Lu , Tong Li , Guangming Lu","doi":"10.1016/j.image.2023.117022","DOIUrl":"https://doi.org/10.1016/j.image.2023.117022","url":null,"abstract":"<div><p>Image steganography aims to achieve covert communication<span><span> between two partners utilizing stego images generated by hiding </span>secret images<span> within cover images. Existing deep image steganography methods have been rapidly developed in this area. Such methods, however, usually generate the stego images and reveal the secret images using one-process networks, lacking sufficient refinement in these methods. Thus, the security and quality of stego and revealed secret images still have much room for promotion, especially for large-capacity image steganography. This paper proposes Joint Adjustment Image Steganography Networks (JAIS-Nets), containing a series of coarse-to-fine iterative adjustment processes, for image steganography. Our JAIS-Nets first proposes Cross-Process Contrastive Refinement (CPCR) adjustment method, using the cross-process contrastive information from cover-stego and secret-revealed secret image pairs, to iteratively refine the generated stego and revealed secret images, respectively. In addition, our JAIS-Nets further proposes Cross-Process Multi-Scale (CPMS) adjustment method, using the cross-process multi-scale information from different scales cover-stego and secret-revealed secret image pairs, to directly adjust and enhance the intermediate representations of the proposed JAIS-Nets. Integrating the proposed CPCR with CPMS methods, the proposed JAIS-Nets can jointly adjust the quality of the stego and revealed secret images at both the learning process and image scale levels. Extensive experiments demonstrate that our JAIS-Nets can achieve state-of-the-art performances on the security and quality of the stego and revealed secret images on both the regular and large capacity image steganography.</span></span></p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"118 ","pages":"Article 117022"},"PeriodicalIF":3.5,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49896214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felix S.K. Yu, Yuk-Hee Chan, Kenneth K.M. Lam, Daniel P.K. Lun
{"title":"Self-embedding reversible color-to-grayscale conversion with watermarking feature","authors":"Felix S.K. Yu, Yuk-Hee Chan, Kenneth K.M. Lam, Daniel P.K. Lun","doi":"10.1016/j.image.2023.117061","DOIUrl":"https://doi.org/10.1016/j.image.2023.117061","url":null,"abstract":"<div><p>This paper presents a self-embedding reversible color-to-grayscale conversion (RCGC) algorithm that makes good use of deep learning, vector quantization, and halftoning techniques to achieve its goals. By decoupling the luminance information of a pixel from its chrominance information, it explicitly controls the luminance error of both the conversion outputs and their corresponding reconstructed color images. It can also alleviate the burden of the deep learning network used to restore the embedded chrominance information during the reconstruction of the color image. Luminance-guided chrominance quantization and checkerboard-based halftoning are introduced in the paper to encode the chrominance information to be embedded while reference-guided inverse halftoning is proposed to restore the color image. Simulation results verify that its performance is remarkably superior to conventional state-of-art RCGC algorithms in various measures. In the aspect of authentication, embedding the watermark and chrominance information is realized with context-based pixel-wise encryption and a key-based watermark bit positioning mechanism, which makes us possible to locate tampered regions and prevent unauthorized use of the chrominance information.</p></div>","PeriodicalId":49521,"journal":{"name":"Signal Processing-Image Communication","volume":"119 ","pages":"Article 117061"},"PeriodicalIF":3.5,"publicationDate":"2023-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49838796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}