{"title":"A robust watermarking approach for medical image authentication using dual image and quorum function","authors":"","doi":"10.1016/j.jvcir.2024.104299","DOIUrl":"10.1016/j.jvcir.2024.104299","url":null,"abstract":"<div><div>To safeguard the identity and copyright of a patient’s medical documents, watermarking strategies are widely used. This work provides a new dual image-based watermarking approach using the quorum function (QF) and AD interpolation technique. AD interpolation is used to create the dual images which helps to increase the embedding capacity. Moreover, the rules for using the QF are designed in such a way, that the original bits are least affected after embedding. As a result, it increases the visual quality of the stego images. A shared secret key has been employed to protect the information hidden in the medical image and to maintain the privacy and confidentiality. The experimental result using PSNR, SSIM, NCC, and EC shows that the suggested technique gives an average PSNR of 68.44 dB and SSIM is close to 0.99 after inserting 786432 watermark bits, which demonstrates the superiority of the scheme over other state-of-the-art schemes.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust text watermarking based on average skeleton mass of characters against cross-media attacks","authors":"","doi":"10.1016/j.jvcir.2024.104300","DOIUrl":"10.1016/j.jvcir.2024.104300","url":null,"abstract":"<div><div>The wide spread of digital documents makes it essential to protect intellectual property and information security. As a key method of digital copyright protection, robust document watermarking technology has attracted much attention in this context. With the rapid development of current electronic devices, the ways of document theft are no longer limited to copy and transmission. Due to the convenient and fast shooting operation of the camera on paper or screen, current text watermarking methods need to be robust to cope with cross-media transmission. To realize the corresponding robust text watermarking, a text watermarking scheme based on the average skeleton mass of characters is proposed in this paper, and the average skeleton mass of adjacent characters is used to represent the watermark information. In this paper, a watermarking scheme is designed to modify character pixels, which can modify glyphs without loss of transparency and provide high embedding capacity. Compared with the existing manually designed font-based text watermarking schemes, this scheme does not need to accurately segment characters, nor does it rely on stretching characters to the same size for matching, which reduces the need for character segmentation. In addition, the experimental results show that the proposed watermarking scheme can be robust to the information transmission modes including print-scan, print-camera and screen-camera.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective image compression using hybrid DCT and hybrid capsule auto encoder for brain MR images","authors":"","doi":"10.1016/j.jvcir.2024.104296","DOIUrl":"10.1016/j.jvcir.2024.104296","url":null,"abstract":"<div><div>Nowadays, image compression is gaining popularity in various fields because of its storage and transmission capability. This work aims to introduce a medical image (MI) compression model in brain magnetic resonance images (MRI) to mitigate issues in bandwidth and storage. Initially, pre-processing is done to neglect the noises in inputs using the Adaptive Linear Smoothing and Histogram Equalization (ALSHE) method. Then, the Region of Interest (ROI) and Non-ROI parts are separately segmented by the Optimized Fuzzy C-Means (OFCM) approach for reducing high complexity issues. Finally, a novel Hybrid Discrete Cosine Transform-Improved Zero Wavelet (DCT-IZW) is proposed for lossless compression and Hybrid Equilibrium Optimization-Capsule Auto Encoder (EO-CAE) for lossy compression. Then, the compressed ROI and Non-ROI images are added together, and the inverse operation of the compression process is performed to obtain the reconstructed image. This study used BRATS (2015, 2018) datasets for simulation and attained better performance than other existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Diving deep into human action recognition in aerial videos: A survey","authors":"","doi":"10.1016/j.jvcir.2024.104298","DOIUrl":"10.1016/j.jvcir.2024.104298","url":null,"abstract":"<div><div>Human Action Recognition from Unmanned Aerial Vehicles is a dynamic research domain with significant benefits in scale, mobility, deployment, and covert observation. This paper offers a comprehensive review of state-of-the-art algorithms for human action recognition and provides a novel taxonomy that categorizes the reviewed methods into two broad categories: Localization based and Globalization based. These categories are defined by how actions are segmented from visual data and how their spatial and temporal structures are modeled. We examine these techniques, highlighting their strengths and limitations, and provide essential background on human action recognition, including fundamental concepts and challenges in aerial videos. Additionally, we discuss existing datasets, enabling a comparative analysis. This survey identifies gaps and suggests future research directions, serving as a catalyst for advancing human action recognition in aerial videos. To our knowledge, this is the first detailed review of this kind.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142318711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zero-CSC: Low-light image enhancement with zero-reference color self-calibration","authors":"","doi":"10.1016/j.jvcir.2024.104293","DOIUrl":"10.1016/j.jvcir.2024.104293","url":null,"abstract":"<div><p>Zero-Reference Low-Light Image Enhancement (LLIE) techniques mainly focus on grey-scale inhomogeneities, and few methods consider how to explicitly recover a dark scene to achieve enhancements in color and overall illumination. In this paper, we introduce a novel Zero-Reference Color Self-Calibration framework for enhancing low-light images, termed as Zero-CSC. It effectively emphasizes channel-wise representations that contain fine-grained color information, achieving a natural result in a progressive manner. Furthermore, we propose a Light Up (LU) module with large-kernel convolutional blocks to improve overall illumination, which is implemented with a simple U-Net and further simplified with a light-weight structure. Experiments on representative datasets show that our model consistently achieves state-of-the-art performance in image signal-to-noise ratio, structural similarity, and color accuracy, setting new records on the challenging SICE dataset with improvements of 23.7% in image signal-to-noise ratio and 5.3% in structural similarity compared to the most advanced methods.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"M-YOLOv8s: An improved small target detection algorithm for UAV aerial photography","authors":"","doi":"10.1016/j.jvcir.2024.104289","DOIUrl":"10.1016/j.jvcir.2024.104289","url":null,"abstract":"<div><div>The object of UAV target detection usually means small target with complicated backgrounds. In this paper, an object detection model M-YOLOv8s based on UAV aerial photography scene is proposed. Firstly, to solve the problem that the YOLOv8s model cannot adapt to small target detection, a small target detection head (STDH) module is introduced to fuse the location and appearance feature information of the shallow layers of the backbone network. Secondly, Inner-Wise intersection over union (Inner-WIoU) is designed as the boundary box regression loss, and auxiliary boundary calculation is used to accelerate the regression speed of the model. Thirdly, the structure of multi-scale feature pyramid network (MS-FPN) can effectively combine the shallow network information with the deep network information and improve the performance of the detection model. Furthermore, a multi-scale cross-spatial attention (MCSA) module is proposed to expand the feature space through multi-scale branch, and then achieves the aggregation of target features through cross-spatial interaction, which improves the ability of the model to extract target features. Finally, the experimental results show that our model does not only possess fewer parameters, but also the values of mAP<sub>0.5</sub> are 6.6% and 5.4% higher than the baseline model on the Visdrone2019 validation dataset and test dataset, respectively. Then, as a conclusion, the M-YOLOv8s model achieves better detection performance than some existing ones, indicating that our proposed method can be more suitable for detecting the small targets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142315099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Low-complexity content-aware encoding optimization of batch video","authors":"","doi":"10.1016/j.jvcir.2024.104295","DOIUrl":"10.1016/j.jvcir.2024.104295","url":null,"abstract":"<div><p>With the proliferation of short-form video traffic, video service providers are faced with the challenge of balancing video quality and bandwidth consumption while processing massive volumes of videos. The most straightforward and simplistic approach is to set uniformly encoding parameters to all videos. However, such an approach fails to consider the differences in video content, and there may be alternative encoding parameter configuration approach that can improve global coding efficiency. Finding the optimal combination of encoding parameter configurations for a batch of videos requires an amount of redundant encoding, thereby introducing significant computational costs. To address this issue, we propose a low-complexity encoding parameter prediction model that can adaptively adjust the values of the encoding parameters based on video content. The experiments show that when only changing the value of the encoding parameter CRF, our prediction model can achieve 27.04%, 6.11%, and 15.92% bit saving in terms of PSNR, SSIM, and VMAF respectively, while maintaining an acceptable complexity compared to the approach using the same CRF value.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging occupancy map to accelerate video-based point cloud compression","authors":"","doi":"10.1016/j.jvcir.2024.104292","DOIUrl":"10.1016/j.jvcir.2024.104292","url":null,"abstract":"<div><p>Video-based Point Cloud Compression enables point cloud streaming over the internet by converting dynamic 3D point clouds to 2D geometry and attribute videos, which are then compressed using 2D video codecs like H.266/VVC. However, the complex encoding process of H.266/VVC, such as the quadtree with nested multi-type tree (QTMT) partition, greatly hinders the practical application of V-PCC. To address this issue, we propose a fast CU partition method dedicated to V-PCC to accelerate the coding process. Specifically, we classify coding units (CUs) of projected images into three categories based on the occupancy map of a point cloud: unoccupied, partially occupied, and fully occupied. Subsequently, we employ either statistic-based rules or machine-learning models to manage the partition of each category. For unoccupied CUs, we terminate the partition directly; for partially occupied CUs with explicit directions, we selectively skip certain partition candidates; for the remaining CUs (partially occupied CUs with complex directions and fully occupied CUs), we train an edge-driven LightGBM model to predict the partition probability of each partition candidate automatically. Only partitions with high probabilities are retained for further Rate–Distortion (R–D) decisions. Comprehensive experiments demonstrate the superior performance of our proposed method: under the V-PCC common test conditions, our method reduces encoding time by 52% and 44% in geometry and attribute, respectively, while incurring only 0.68% (0.66%) BD-Rate loss in D1 (D2) measurements and 0.79% (luma) BD-Rate loss in attribute, significantly surpassing state-of-the-art works.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SR4KVQA: Video quality assessment database and metric for 4K super-resolution","authors":"","doi":"10.1016/j.jvcir.2024.104290","DOIUrl":"10.1016/j.jvcir.2024.104290","url":null,"abstract":"<div><p>The quality assessment for 4K super-resolution (SR) videos can be conducive to the optimization of video SR algorithms. To improve the subjective and objective consistency of the SR quality assessment, a 4K video database and a blind metric are proposed in this paper. In the database SR4KVQA, there are 30 4K pristine videos, from which 600 SR 4K distorted videos with mean opinion score (MOS) labels are generated by three classic interpolation methods, six SR algorithms based on the deep neural network (DNN), and two SR algorithms based on the generative adversarial network (GAN). The benchmark experiment of the proposed database indicates that video quality assessment (VQA) of the 4K SR videos is challenging for the existing metrics. Among those metrics, the Video-Swin-Transformer backbone demonstrates tremendous potential in the VQA task. Accordingly, a blind VQA metric based on the Video-Swin-Transformer backbone is established, where the normalized loss function and optimized spatio-temporal sampling strategy are applied. The experiment result manifests that the Pearson linear correlation coefficient (PLCC) and Spearman rank-order correlation coefficient (SROCC) of the proposed metric reach 0.8011 and 0.8275 respectively on the SR4KVQA database, which outperforms or competes with the state-of-the-art VQA metrics. The database and the code proposed in this paper are available in the GitHub repository, <span><span>https://github.com/AlexReadyNico/SR4KVQA</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data compensation and feature fusion for sketch based person retrieval","authors":"","doi":"10.1016/j.jvcir.2024.104287","DOIUrl":"10.1016/j.jvcir.2024.104287","url":null,"abstract":"<div><div>Sketch re-identification (Re-ID) aims to retrieve pedestrian photo in the gallery dataset by a query sketch drawn by professionals. The sketch Re-ID task has not been adequately studied because collecting such sketches is difficult and expensive. In addition, the significant modality difference between sketches and images makes extracting the discriminative feature information difficult. To address above issues, we introduce a novel sketch-style pedestrian dataset named Pseudo-Sketch dataset. Our proposed dataset maximizes the utilization of the existing person dataset resources and is freely available, thus effectively reducing the expenses associated with the training and deployment phases. Furthermore, to mitigate the modality gap between sketches and visible images, a cross-modal feature fusion network is proposed that incorporates information from each modality. Experiment results show that the proposed Pseudo-Sketch dataset can effectively complement the real sketch dataset, and the proposed network obtains competitive results than SOTA methods. The dataset will be released later.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}