Jianming Zhang, Jia Jiang, Mingshuang Wu, Zhijian Feng, Xiangnan Shi
{"title":"Illumination-guided dual-branch fusion network for partition-based image exposure correction","authors":"Jianming Zhang, Jia Jiang, Mingshuang Wu, Zhijian Feng, Xiangnan Shi","doi":"10.1016/j.jvcir.2024.104342","DOIUrl":"10.1016/j.jvcir.2024.104342","url":null,"abstract":"<div><div>Images captured in the wild often suffer from issues such as under-exposure, over-exposure, or sometimes a combination of both. These images tend to lose details and texture due to uneven exposure. The majority of image enhancement methods currently focus on correcting either under-exposure or over-exposure, but there are only a few methods available that can effectively handle these two problems simultaneously. In order to address these issues, a novel partition-based exposure correction method is proposed. Firstly, our method calculates the illumination map to generate a partition mask that divides the original image into under-exposed and over-exposed areas. Then, we propose a Transformer-based parameter estimation module to estimate the dual gamma values for partition-based exposure correction. Finally, we introduce a dual-branch fusion module to merge the original image with the exposure-corrected image to obtain the final result. It is worth noting that the illumination map plays a guiding role in both the dual gamma model parameters estimation and the dual-branch fusion. Extensive experiments demonstrate that the proposed method consistently achieves superior performance over state-of-the-art (SOTA) methods on 9 datasets with paired or unpaired samples. Our codes are available at <span><span>https://github.com/csust7zhangjm/ExposureCorrectionWMS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"106 ","pages":"Article 104342"},"PeriodicalIF":2.6,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142700395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced soft domain adaptation for object detection in the dark","authors":"Yunfei Bai , Chang Liu , Rui Yang , Xiaomao Li","doi":"10.1016/j.jvcir.2024.104337","DOIUrl":"10.1016/j.jvcir.2024.104337","url":null,"abstract":"<div><div>Unlike foggy conditions, domain adaptation is rarely facilitated in dark detection tasks due to the lack of dark datasets. We generate target low-light images via swapping the ring-shaped frequency spectrum of Exdark with Cityscapes, and surprisingly find the promotion is less satisfactory. The root lies in non-transferable alignment that excessively highlights dark backgrounds. To tackle this issue, we propose an Enhanced Soft Domain Adaptation (ESDA) framework to focus on background misalignment. Specifically, Soft Domain Adaptation (SDA) compensates for over-alignment of backgrounds by providing different soft labels for foreground and background samples. The Highlight Foreground (HF), by introducing center sampling, increases the number of high-quality background samples for training. Suppress Background (SB) weakens non-transferable background alignment by replacing foreground scores with backgrounds. Experimental results show SDA combined with HF and SB is sufficiently strengthened and achieves state-of-the-art performance using multiple cross-domain benchmarks. Note that ESDA yields 11.8% relative improvement on the real-world ExDark dataset.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"106 ","pages":"Article 104337"},"PeriodicalIF":2.6,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HRGUNet: A novel high-resolution generative adversarial network combined with an improved UNet method for brain tumor segmentation","authors":"Dongmei Zhou, Hao Luo, Xingyang Li, Shengbing Chen","doi":"10.1016/j.jvcir.2024.104345","DOIUrl":"10.1016/j.jvcir.2024.104345","url":null,"abstract":"<div><div>Brain tumor segmentation in MRI images is challenging due to variability in tumor characteristics and low contrast. We propose HRGUNet, which combines a high-resolution generative adversarial network with an improved UNet architecture to enhance segmentation accuracy. Our proposed GAN model uses an innovative discriminator design that is able to process complete tumor labels as input. This approach can better ensure that the generator produces realistic tumor labels compared to some existing GAN models that only use local features. Additionally, we introduce a Multi-Scale Pyramid Fusion (MSPF) module to improve fine-grained feature extraction and a Refined Channel Attention (RCA) module to enhance focus on tumor regions. In comparative experiments, our method was verified on the BraTS2020 and BraTS2019 data sets, and the average Dice coefficient increased by 1.5% and 1.2% respectively, and the Hausdorff distance decreased by 23.9% and 15.2% respectively, showing its robustness and generalization for segmenting complex tumor structures.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104345"},"PeriodicalIF":2.6,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wujian Ye , Yue Wang , Yijun Liu , Wenjie Lin , Xin Xiang
{"title":"Panoramic Arbitrary Style Transfer with Deformable Distortion Constraints","authors":"Wujian Ye , Yue Wang , Yijun Liu , Wenjie Lin , Xin Xiang","doi":"10.1016/j.jvcir.2024.104344","DOIUrl":"10.1016/j.jvcir.2024.104344","url":null,"abstract":"<div><div>Neural style transfer is a prominent AI technique for creating captivating visual effects and enhancing user experiences. However, most current methods inadequately handle panoramic images, leading to a loss of original visual semantics and emotions due to insufficient structural feature consideration. To address this, a novel panorama arbitrary style transfer method named PAST-Renderer is proposed by integrating deformable convolutions and distortion constraints. The proposed method can dynamically adjust the position of the convolutional kernels according to the geometric structure of the input image, thereby better adapting to the spatial distortions and deformations in panoramic images. Deformable convolutions enable adaptive transformations on a two-dimensional plane, enhancing content and style feature extraction and fusion in panoramic images. Distortion constraints adjust content and style losses, ensuring semantic consistency in salience, edge, and depth of field with the original image. Experimental results show significant improvements, with the PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index Measure) of stylized panoramic images’ semantic maps increasing by approximately 2–4 dB and 0.1–0.3, respectively. Our method PAST-Renderer performs better in both artistic and realistic style transfer, preserving semantic integrity with natural colors, realistic edge details, and rich thematic content.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"106 ","pages":"Article 104344"},"PeriodicalIF":2.6,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Zhou , Qinghua Su , Zhongbo Hu , Shaojie Jiang
{"title":"Underwater image enhancement method via extreme enhancement and ultimate weakening","authors":"Yang Zhou , Qinghua Su , Zhongbo Hu , Shaojie Jiang","doi":"10.1016/j.jvcir.2024.104341","DOIUrl":"10.1016/j.jvcir.2024.104341","url":null,"abstract":"<div><div>The existing histogram-based methods for underwater image enhancement are prone to over-enhancement, which will affect the analysis of enhanced images. However, an idea that achieves contrast balance by enhancing and weakening the contrast of an image can address the problem. Therefore, an underwater image enhancement method based on extreme enhancement and ultimate weakening (EEUW) is proposed in this paper. This approach comprises two main steps. Firstly, an image with extreme contrast can be achieved by applying grey prediction evolution algorithm (GPE), which is the first time that GPE is introduced into dual-histogram thresholding method to find the optimal segmentation threshold for accurate segmentation. Secondly, a pure gray image can be obtained through a fusion strategy based on the grayscale world assumption to achieve the ultimate weakening. Experiments conducted on three standard underwater image benchmark datasets validate that EEUW outperforms the 10 state-of-the-art methods in improving the contrast of underwater images.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104341"},"PeriodicalIF":2.6,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-level similarity transfer and adaptive fusion data augmentation for few-shot object detection","authors":"Songhao Zhu, Yi Wang","doi":"10.1016/j.jvcir.2024.104340","DOIUrl":"10.1016/j.jvcir.2024.104340","url":null,"abstract":"<div><div>Few-shot object detection method aims to learn novel classes through a small number of annotated novel class samples without having a catastrophic impact on previously learned knowledge, thereby expanding the trained model’s ability to detect novel classes. For existing few-shot object detection methods, there is a prominent false positive issue for the novel class samples due to the similarity in appearance features and feature distribution between the novel classes and the base classes. That is, the following two issues need to be solved: (1) How to detect these false positive samples in large-scale dataset, and (2) How to utilize the correlations between these false positive samples and other samples to improve the accuracy of the detection model. To address the first issue, an adaptive fusion data augmentation strategy is utilized to enhance the diversity of novel class samples and further alleviate the issue of false positive novel class samples. To address the second issue, a similarity transfer strategy is here proposed to effectively utilize the correlations between different categories. Experimental results demonstrate that the proposed method performs well in various settings of PASCAL VOC and MSCOCO datasets, achieving 48.7 and 11.3 on PASCAL VOC and MSCOCO under few-shot settings (shot = 1) in terms of nAP50 respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104340"},"PeriodicalIF":2.6,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color image watermarking using vector SNCM-HMT","authors":"Hongxin Wang, Runtong Ma, Panpan Niu","doi":"10.1016/j.jvcir.2024.104339","DOIUrl":"10.1016/j.jvcir.2024.104339","url":null,"abstract":"<div><div>An image watermarking scheme is typically evaluated using three main conflicting characteristics: imperceptibility, robustness, and capacity. Developing a good image watermarking method is challenging because it requires a trade-off between these three basic characteristics. In this paper, we proposed a statistical color image watermarking based on robust discrete nonseparable Shearlet transform (DNST)-fast quaternion generic polar complex exponential transform (FQGPCET) magnitude and vector skew-normal-Cauchy mixtures (SNCM)-hidden Markov tree (HMT). The proposed watermarking system consists of two main parts: watermark inserting and watermark extraction. In watermark inserting, we first perform DNST on R, G, and B components of color host image, respectively. We then compute block FQGPCET of DNST domain color components, and embed watermark signal in DNST-FQGPCET magnitudes using multiplicative approach. In watermark extraction, we first analyze the robustness and statistical characteristics of local DNST-FQGPCET magnitudes of color image. We then observe that, vector SNCM-HMT model can capture accurately the marginal distribution and multiple strong dependencies of local DNST-FQGPCET magnitudes. Meanwhile, vector SNCM-HMT parameters can be computed effectively using variational expectation–maximization (VEM) parameter estimation. Motivated by our modeling results, we finally develop a new statistical color image watermark decoder based on vector SNCM-HMT and maximum likelihood (ML) decision rule. Experimental results on extensive test images demonstrate that the proposed statistical color image watermarking provides a performance better than that of most of the state-of-the-art statistical methods and some deep learning approaches recently proposed in the literature.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104339"},"PeriodicalIF":2.6,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liping Zhao , Zuge Yan , Keli Hu , Sheng Feng , Jiangda Wang , Xueyan Cao , Tao Lin
{"title":"A memory access number constraint-based string prediction technique for high throughput SCC implemented in AVS3","authors":"Liping Zhao , Zuge Yan , Keli Hu , Sheng Feng , Jiangda Wang , Xueyan Cao , Tao Lin","doi":"10.1016/j.jvcir.2024.104338","DOIUrl":"10.1016/j.jvcir.2024.104338","url":null,"abstract":"<div><div>String prediction (SP) is a highly efficient screen content coding (SCC) tool that has been adopted in international and Chinese video coding standards. SP exhibits a highly flexible and efficient ability to predict repetitive matching patterns. However, SP also suffers from low throughput of decoded display output pixels per memory access, which is synchronized with the decoder clock, due to the high number of memory accesses required to decode an SP coding unit for display. Even in state-of-the-art (SOTA) SP, the worst-case scenario involves two memory accesses for decoding each 4-pixel basic string unit across two memory access units, resulting in a throughput as low as two pixels per memory access (PPMA). To solve this problem, we are the first to propose a technique called memory access number constraint-based string prediction (MANC-SP) to achieve high throughput in SCC. First, a novel MANC-SP framework is proposed, a well-designed memory access number constraint rule is established on the basis of statistical data, and a constrained RDO-based string searching method is presented. Compared with the existing SOTA SP, the experimental results demonstrate that MANC-SP can improve the throughput from 2 to 2.67 PPMA, achieving a throughput improvement of <strong>33.33%</strong> while maintaining a negligible impact on coding efficiency and complexity.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104338"},"PeriodicalIF":2.6,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Faster-slow network fused with enhanced fine-grained features for action recognition","authors":"Xuegang Wu , Jiawei Zhu , Liu Yang","doi":"10.1016/j.jvcir.2024.104328","DOIUrl":"10.1016/j.jvcir.2024.104328","url":null,"abstract":"<div><div>Two-stream methods, which separate human actions and backgrounds into temporal and spatial streams visually, have shown promising results in action recognition datasets. However, prior researches emphasize motion modeling but overlook the robust correlation between motion features and spatial information, causing restriction of the model’s ability to recognize behaviors entailing occlusions or rapid changes. Therefore, we introduce Faster-slow, an improved framework for frame-level motion features. It introduces a Behavioural Feature Enhancement (BFE) module based on a novel two-stream network with different temporal resolutions. BFE consists of two components: MM, which incorporates motion-aware attention to capture dependencies between adjacent frames; STC, which enhances spatio-temporal and channel information to generate optimized features. Overall, BFE facilitates the extraction of finer-grained motion information, while ensuring a stable fusion of information across both streams. We evaluate the Faster-slow on the Atomic Visual Actions dataset, and the Faster-AVA dataset constructed in this paper, yielding promising experimental results.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104328"},"PeriodicalIF":2.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongyue Huang , Chen Cui , Chuanmin Jia , Xinfeng Zhang , Siwei Ma
{"title":"Lightweight macro-pixel quality enhancement network for light field images compressed by versatile video coding","authors":"Hongyue Huang , Chen Cui , Chuanmin Jia , Xinfeng Zhang , Siwei Ma","doi":"10.1016/j.jvcir.2024.104329","DOIUrl":"10.1016/j.jvcir.2024.104329","url":null,"abstract":"<div><div>Previous research demonstrated that filtering Macro-Pixels (MPs) in a decoded Light Field Image (LFI) sequence can effectively enhances the quality of the corresponding Sub-Aperture Images (SAIs). In this paper, we propose a deep-learning-based quality enhancement model following the MP-wise processing approach tailored to LFIs encoded by the Versatile Video Coding (VVC) standard. The proposed novel Res2Net Quality Enhancement Convolutional Neural Network (R2NQE-CNN) architecture is both lightweight and powerful, in which the Res2Net modules are employed to perform LFI filtering for the first time, and are implemented with a novel improved 3D-feature-processing structure. The proposed method incorporates only 205K model parameters and achieves significant Y-BD-rate reductions over VVC of up to 32%, representing a relative improvement of up to 33% compared to the state-of-the-art method, which has more than three times the number of parameters of our proposed model.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104329"},"PeriodicalIF":2.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}