{"title":"HRGUNet: A novel high-resolution generative adversarial network combined with an improved UNet method for brain tumor segmentation","authors":"Dongmei Zhou, Hao Luo, Xingyang Li, Shengbing Chen","doi":"10.1016/j.jvcir.2024.104345","DOIUrl":"10.1016/j.jvcir.2024.104345","url":null,"abstract":"<div><div>Brain tumor segmentation in MRI images is challenging due to variability in tumor characteristics and low contrast. We propose HRGUNet, which combines a high-resolution generative adversarial network with an improved UNet architecture to enhance segmentation accuracy. Our proposed GAN model uses an innovative discriminator design that is able to process complete tumor labels as input. This approach can better ensure that the generator produces realistic tumor labels compared to some existing GAN models that only use local features. Additionally, we introduce a Multi-Scale Pyramid Fusion (MSPF) module to improve fine-grained feature extraction and a Refined Channel Attention (RCA) module to enhance focus on tumor regions. In comparative experiments, our method was verified on the BraTS2020 and BraTS2019 data sets, and the average Dice coefficient increased by 1.5% and 1.2% respectively, and the Hausdorff distance decreased by 23.9% and 15.2% respectively, showing its robustness and generalization for segmenting complex tumor structures.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104345"},"PeriodicalIF":2.6,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wujian Ye , Yue Wang , Yijun Liu , Wenjie Lin , Xin Xiang
{"title":"Panoramic Arbitrary Style Transfer with Deformable Distortion Constraints","authors":"Wujian Ye , Yue Wang , Yijun Liu , Wenjie Lin , Xin Xiang","doi":"10.1016/j.jvcir.2024.104344","DOIUrl":"10.1016/j.jvcir.2024.104344","url":null,"abstract":"<div><div>Neural style transfer is a prominent AI technique for creating captivating visual effects and enhancing user experiences. However, most current methods inadequately handle panoramic images, leading to a loss of original visual semantics and emotions due to insufficient structural feature consideration. To address this, a novel panorama arbitrary style transfer method named PAST-Renderer is proposed by integrating deformable convolutions and distortion constraints. The proposed method can dynamically adjust the position of the convolutional kernels according to the geometric structure of the input image, thereby better adapting to the spatial distortions and deformations in panoramic images. Deformable convolutions enable adaptive transformations on a two-dimensional plane, enhancing content and style feature extraction and fusion in panoramic images. Distortion constraints adjust content and style losses, ensuring semantic consistency in salience, edge, and depth of field with the original image. Experimental results show significant improvements, with the PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index Measure) of stylized panoramic images’ semantic maps increasing by approximately 2–4 dB and 0.1–0.3, respectively. Our method PAST-Renderer performs better in both artistic and realistic style transfer, preserving semantic integrity with natural colors, realistic edge details, and rich thematic content.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"106 ","pages":"Article 104344"},"PeriodicalIF":2.6,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142743787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Zhou , Qinghua Su , Zhongbo Hu , Shaojie Jiang
{"title":"Underwater image enhancement method via extreme enhancement and ultimate weakening","authors":"Yang Zhou , Qinghua Su , Zhongbo Hu , Shaojie Jiang","doi":"10.1016/j.jvcir.2024.104341","DOIUrl":"10.1016/j.jvcir.2024.104341","url":null,"abstract":"<div><div>The existing histogram-based methods for underwater image enhancement are prone to over-enhancement, which will affect the analysis of enhanced images. However, an idea that achieves contrast balance by enhancing and weakening the contrast of an image can address the problem. Therefore, an underwater image enhancement method based on extreme enhancement and ultimate weakening (EEUW) is proposed in this paper. This approach comprises two main steps. Firstly, an image with extreme contrast can be achieved by applying grey prediction evolution algorithm (GPE), which is the first time that GPE is introduced into dual-histogram thresholding method to find the optimal segmentation threshold for accurate segmentation. Secondly, a pure gray image can be obtained through a fusion strategy based on the grayscale world assumption to achieve the ultimate weakening. Experiments conducted on three standard underwater image benchmark datasets validate that EEUW outperforms the 10 state-of-the-art methods in improving the contrast of underwater images.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104341"},"PeriodicalIF":2.6,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142706864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-level similarity transfer and adaptive fusion data augmentation for few-shot object detection","authors":"Songhao Zhu, Yi Wang","doi":"10.1016/j.jvcir.2024.104340","DOIUrl":"10.1016/j.jvcir.2024.104340","url":null,"abstract":"<div><div>Few-shot object detection method aims to learn novel classes through a small number of annotated novel class samples without having a catastrophic impact on previously learned knowledge, thereby expanding the trained model’s ability to detect novel classes. For existing few-shot object detection methods, there is a prominent false positive issue for the novel class samples due to the similarity in appearance features and feature distribution between the novel classes and the base classes. That is, the following two issues need to be solved: (1) How to detect these false positive samples in large-scale dataset, and (2) How to utilize the correlations between these false positive samples and other samples to improve the accuracy of the detection model. To address the first issue, an adaptive fusion data augmentation strategy is utilized to enhance the diversity of novel class samples and further alleviate the issue of false positive novel class samples. To address the second issue, a similarity transfer strategy is here proposed to effectively utilize the correlations between different categories. Experimental results demonstrate that the proposed method performs well in various settings of PASCAL VOC and MSCOCO datasets, achieving 48.7 and 11.3 on PASCAL VOC and MSCOCO under few-shot settings (shot = 1) in terms of nAP50 respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104340"},"PeriodicalIF":2.6,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color image watermarking using vector SNCM-HMT","authors":"Hongxin Wang, Runtong Ma, Panpan Niu","doi":"10.1016/j.jvcir.2024.104339","DOIUrl":"10.1016/j.jvcir.2024.104339","url":null,"abstract":"<div><div>An image watermarking scheme is typically evaluated using three main conflicting characteristics: imperceptibility, robustness, and capacity. Developing a good image watermarking method is challenging because it requires a trade-off between these three basic characteristics. In this paper, we proposed a statistical color image watermarking based on robust discrete nonseparable Shearlet transform (DNST)-fast quaternion generic polar complex exponential transform (FQGPCET) magnitude and vector skew-normal-Cauchy mixtures (SNCM)-hidden Markov tree (HMT). The proposed watermarking system consists of two main parts: watermark inserting and watermark extraction. In watermark inserting, we first perform DNST on R, G, and B components of color host image, respectively. We then compute block FQGPCET of DNST domain color components, and embed watermark signal in DNST-FQGPCET magnitudes using multiplicative approach. In watermark extraction, we first analyze the robustness and statistical characteristics of local DNST-FQGPCET magnitudes of color image. We then observe that, vector SNCM-HMT model can capture accurately the marginal distribution and multiple strong dependencies of local DNST-FQGPCET magnitudes. Meanwhile, vector SNCM-HMT parameters can be computed effectively using variational expectation–maximization (VEM) parameter estimation. Motivated by our modeling results, we finally develop a new statistical color image watermark decoder based on vector SNCM-HMT and maximum likelihood (ML) decision rule. Experimental results on extensive test images demonstrate that the proposed statistical color image watermarking provides a performance better than that of most of the state-of-the-art statistical methods and some deep learning approaches recently proposed in the literature.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104339"},"PeriodicalIF":2.6,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liping Zhao , Zuge Yan , Keli Hu , Sheng Feng , Jiangda Wang , Xueyan Cao , Tao Lin
{"title":"A memory access number constraint-based string prediction technique for high throughput SCC implemented in AVS3","authors":"Liping Zhao , Zuge Yan , Keli Hu , Sheng Feng , Jiangda Wang , Xueyan Cao , Tao Lin","doi":"10.1016/j.jvcir.2024.104338","DOIUrl":"10.1016/j.jvcir.2024.104338","url":null,"abstract":"<div><div>String prediction (SP) is a highly efficient screen content coding (SCC) tool that has been adopted in international and Chinese video coding standards. SP exhibits a highly flexible and efficient ability to predict repetitive matching patterns. However, SP also suffers from low throughput of decoded display output pixels per memory access, which is synchronized with the decoder clock, due to the high number of memory accesses required to decode an SP coding unit for display. Even in state-of-the-art (SOTA) SP, the worst-case scenario involves two memory accesses for decoding each 4-pixel basic string unit across two memory access units, resulting in a throughput as low as two pixels per memory access (PPMA). To solve this problem, we are the first to propose a technique called memory access number constraint-based string prediction (MANC-SP) to achieve high throughput in SCC. First, a novel MANC-SP framework is proposed, a well-designed memory access number constraint rule is established on the basis of statistical data, and a constrained RDO-based string searching method is presented. Compared with the existing SOTA SP, the experimental results demonstrate that MANC-SP can improve the throughput from 2 to 2.67 PPMA, achieving a throughput improvement of <strong>33.33%</strong> while maintaining a negligible impact on coding efficiency and complexity.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104338"},"PeriodicalIF":2.6,"publicationDate":"2024-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Faster-slow network fused with enhanced fine-grained features for action recognition","authors":"Xuegang Wu , Jiawei Zhu , Liu Yang","doi":"10.1016/j.jvcir.2024.104328","DOIUrl":"10.1016/j.jvcir.2024.104328","url":null,"abstract":"<div><div>Two-stream methods, which separate human actions and backgrounds into temporal and spatial streams visually, have shown promising results in action recognition datasets. However, prior researches emphasize motion modeling but overlook the robust correlation between motion features and spatial information, causing restriction of the model’s ability to recognize behaviors entailing occlusions or rapid changes. Therefore, we introduce Faster-slow, an improved framework for frame-level motion features. It introduces a Behavioural Feature Enhancement (BFE) module based on a novel two-stream network with different temporal resolutions. BFE consists of two components: MM, which incorporates motion-aware attention to capture dependencies between adjacent frames; STC, which enhances spatio-temporal and channel information to generate optimized features. Overall, BFE facilitates the extraction of finer-grained motion information, while ensuring a stable fusion of information across both streams. We evaluate the Faster-slow on the Atomic Visual Actions dataset, and the Faster-AVA dataset constructed in this paper, yielding promising experimental results.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104328"},"PeriodicalIF":2.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongyue Huang , Chen Cui , Chuanmin Jia , Xinfeng Zhang , Siwei Ma
{"title":"Lightweight macro-pixel quality enhancement network for light field images compressed by versatile video coding","authors":"Hongyue Huang , Chen Cui , Chuanmin Jia , Xinfeng Zhang , Siwei Ma","doi":"10.1016/j.jvcir.2024.104329","DOIUrl":"10.1016/j.jvcir.2024.104329","url":null,"abstract":"<div><div>Previous research demonstrated that filtering Macro-Pixels (MPs) in a decoded Light Field Image (LFI) sequence can effectively enhances the quality of the corresponding Sub-Aperture Images (SAIs). In this paper, we propose a deep-learning-based quality enhancement model following the MP-wise processing approach tailored to LFIs encoded by the Versatile Video Coding (VVC) standard. The proposed novel Res2Net Quality Enhancement Convolutional Neural Network (R2NQE-CNN) architecture is both lightweight and powerful, in which the Res2Net modules are employed to perform LFI filtering for the first time, and are implemented with a novel improved 3D-feature-processing structure. The proposed method incorporates only 205K model parameters and achieves significant Y-BD-rate reductions over VVC of up to 32%, representing a relative improvement of up to 33% compared to the state-of-the-art method, which has more than three times the number of parameters of our proposed model.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104329"},"PeriodicalIF":2.6,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TrMLGAN: Transmission MultiLoss Generative Adversarial Network framework for image dehazing","authors":"Pulkit Dwivedi, Soumendu Chakraborty","doi":"10.1016/j.jvcir.2024.104324","DOIUrl":"10.1016/j.jvcir.2024.104324","url":null,"abstract":"<div><div>Hazy environments significantly degrade image quality, leading to poor contrast and reduced visibility. Existing dehazing methods often struggle to predict the transmission map, which is crucial for accurate dehazing. This study introduces the Transmission MultiLoss Generative Adversarial Network (TrMLGAN), a novel framework designed to enhance transmission map estimation for improved dehazing. The transmission map is initially computed using a dark channel prior-based approach and refined using the TrMLGAN framework, which leverages Generative Adversarial Networks (GANs). By integrating multiple loss functions, such as adversarial, pixel-wise similarity, perceptual similarity, and SSIM losses, our method focuses on various aspects of image quality. This enables robust dehazing performance without direct dependence on ground-truth images. Evaluations using PSNR, SSIM, FADE, NIQE, BRISQUE, and SSEQ metrics show that TrMLGAN significantly outperforms state-of-the-art methods across datasets including D-HAZY, HSTS, SOTS Outdoor, NH-HAZE, and D-Hazy, validating its potential for real-world applications.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104324"},"PeriodicalIF":2.6,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Video Question Answering: A survey of the state-of-the-art","authors":"Jeshmol P.J., Binsu C. Kovoor","doi":"10.1016/j.jvcir.2024.104320","DOIUrl":"10.1016/j.jvcir.2024.104320","url":null,"abstract":"<div><div>Video Question Answering (VideoQA) emerges as a prominent trend in the domain of Artificial Intelligence, Computer Vision, and Natural Language Processing. It involves developing systems capable of understanding, analyzing, and responding to questions about the content of videos. The Proposed survey presents an in-depth overview of the current landscape of Question Answering, shedding light on the challenges, methodologies, datasets, and innovative approaches in the domain. The key components of the Video Question Answering (VideoQA) framework include video feature extraction, question processing, reasoning, and response generation. It underscores the importance of datasets in shaping VideoQA research and the diversity of question types, from factual inquiries to spatial and temporal reasoning. The survey highlights the ongoing research directions and future prospects for VideoQA. Finally, the proposed survey gives a road map for future explorations at the intersection of multiple disciplines, emphasizing the ultimate objective of pushing the boundaries of knowledge and innovation.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104320"},"PeriodicalIF":2.6,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}