Journal of Visual Communication and Image Representation最新文献

筛选
英文 中文
A robust watermarking approach for medical image authentication using dual image and quorum function 使用双图像和法定人数函数的医疗图像认证鲁棒水印方法
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104299
{"title":"A robust watermarking approach for medical image authentication using dual image and quorum function","authors":"","doi":"10.1016/j.jvcir.2024.104299","DOIUrl":"10.1016/j.jvcir.2024.104299","url":null,"abstract":"<div><div>To safeguard the identity and copyright of a patient’s medical documents, watermarking strategies are widely used. This work provides a new dual image-based watermarking approach using the quorum function (QF) and AD interpolation technique. AD interpolation is used to create the dual images which helps to increase the embedding capacity. Moreover, the rules for using the QF are designed in such a way, that the original bits are least affected after embedding. As a result, it increases the visual quality of the stego images. A shared secret key has been employed to protect the information hidden in the medical image and to maintain the privacy and confidentiality. The experimental result using PSNR, SSIM, NCC, and EC shows that the suggested technique gives an average PSNR of 68.44 dB and SSIM is close to 0.99 after inserting 786432 watermark bits, which demonstrates the superiority of the scheme over other state-of-the-art schemes.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust text watermarking based on average skeleton mass of characters against cross-media attacks 基于字符平均骨架质量的鲁棒文本水印技术对抗跨媒体攻击
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104300
{"title":"Robust text watermarking based on average skeleton mass of characters against cross-media attacks","authors":"","doi":"10.1016/j.jvcir.2024.104300","DOIUrl":"10.1016/j.jvcir.2024.104300","url":null,"abstract":"<div><div>The wide spread of digital documents makes it essential to protect intellectual property and information security. As a key method of digital copyright protection, robust document watermarking technology has attracted much attention in this context. With the rapid development of current electronic devices, the ways of document theft are no longer limited to copy and transmission. Due to the convenient and fast shooting operation of the camera on paper or screen, current text watermarking methods need to be robust to cope with cross-media transmission. To realize the corresponding robust text watermarking, a text watermarking scheme based on the average skeleton mass of characters is proposed in this paper, and the average skeleton mass of adjacent characters is used to represent the watermark information. In this paper, a watermarking scheme is designed to modify character pixels, which can modify glyphs without loss of transparency and provide high embedding capacity. Compared with the existing manually designed font-based text watermarking schemes, this scheme does not need to accurately segment characters, nor does it rely on stretching characters to the same size for matching, which reduces the need for character segmentation. In addition, the experimental results show that the proposed watermarking scheme can be robust to the information transmission modes including print-scan, print-camera and screen-camera.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective image compression using hybrid DCT and hybrid capsule auto encoder for brain MR images 使用混合 DCT 和混合胶囊自动编码器对脑部 MR 图像进行有效压缩
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-10-01 DOI: 10.1016/j.jvcir.2024.104296
{"title":"Effective image compression using hybrid DCT and hybrid capsule auto encoder for brain MR images","authors":"","doi":"10.1016/j.jvcir.2024.104296","DOIUrl":"10.1016/j.jvcir.2024.104296","url":null,"abstract":"<div><div>Nowadays, image compression is gaining popularity in various fields because of its storage and transmission capability. This work aims to introduce a medical image (MI) compression model in brain magnetic resonance images (MRI) to mitigate issues in bandwidth and storage. Initially, pre-processing is done to neglect the noises in inputs using the Adaptive Linear Smoothing and Histogram Equalization (ALSHE) method. Then, the Region of Interest (ROI) and Non-ROI parts are separately segmented by the Optimized Fuzzy C-Means (OFCM) approach for reducing high complexity issues. Finally, a novel Hybrid Discrete Cosine Transform-Improved Zero Wavelet (DCT-IZW) is proposed for lossless compression and Hybrid Equilibrium Optimization-Capsule Auto Encoder (EO-CAE) for lossy compression. Then, the compressed ROI and Non-ROI images are added together, and the inverse operation of the compression process is performed to obtain the reconstructed image. This study used BRATS (2015, 2018) datasets for simulation and attained better performance than other existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diving deep into human action recognition in aerial videos: A survey 深入研究航拍视频中的人类动作识别:调查
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-23 DOI: 10.1016/j.jvcir.2024.104298
{"title":"Diving deep into human action recognition in aerial videos: A survey","authors":"","doi":"10.1016/j.jvcir.2024.104298","DOIUrl":"10.1016/j.jvcir.2024.104298","url":null,"abstract":"<div><div>Human Action Recognition from Unmanned Aerial Vehicles is a dynamic research domain with significant benefits in scale, mobility, deployment, and covert observation. This paper offers a comprehensive review of state-of-the-art algorithms for human action recognition and provides a novel taxonomy that categorizes the reviewed methods into two broad categories: Localization based and Globalization based. These categories are defined by how actions are segmented from visual data and how their spatial and temporal structures are modeled. We examine these techniques, highlighting their strengths and limitations, and provide essential background on human action recognition, including fundamental concepts and challenges in aerial videos. Additionally, we discuss existing datasets, enabling a comparative analysis. This survey identifies gaps and suggests future research directions, serving as a catalyst for advancing human action recognition in aerial videos. To our knowledge, this is the first detailed review of this kind.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142318711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-CSC: Low-light image enhancement with zero-reference color self-calibration Zero-CSC:利用零参考色彩自校准功能增强低照度图像效果
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-20 DOI: 10.1016/j.jvcir.2024.104293
{"title":"Zero-CSC: Low-light image enhancement with zero-reference color self-calibration","authors":"","doi":"10.1016/j.jvcir.2024.104293","DOIUrl":"10.1016/j.jvcir.2024.104293","url":null,"abstract":"<div><p>Zero-Reference Low-Light Image Enhancement (LLIE) techniques mainly focus on grey-scale inhomogeneities, and few methods consider how to explicitly recover a dark scene to achieve enhancements in color and overall illumination. In this paper, we introduce a novel Zero-Reference Color Self-Calibration framework for enhancing low-light images, termed as Zero-CSC. It effectively emphasizes channel-wise representations that contain fine-grained color information, achieving a natural result in a progressive manner. Furthermore, we propose a Light Up (LU) module with large-kernel convolutional blocks to improve overall illumination, which is implemented with a simple U-Net and further simplified with a light-weight structure. Experiments on representative datasets show that our model consistently achieves state-of-the-art performance in image signal-to-noise ratio, structural similarity, and color accuracy, setting new records on the challenging SICE dataset with improvements of 23.7% in image signal-to-noise ratio and 5.3% in structural similarity compared to the most advanced methods.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M-YOLOv8s: An improved small target detection algorithm for UAV aerial photography M-YOLOv8s:用于无人机航空摄影的改进型小目标检测算法
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-18 DOI: 10.1016/j.jvcir.2024.104289
{"title":"M-YOLOv8s: An improved small target detection algorithm for UAV aerial photography","authors":"","doi":"10.1016/j.jvcir.2024.104289","DOIUrl":"10.1016/j.jvcir.2024.104289","url":null,"abstract":"<div><div>The object of UAV target detection usually means small target with complicated backgrounds. In this paper, an object detection model M-YOLOv8s based on UAV aerial photography scene is proposed. Firstly, to solve the problem that the YOLOv8s model cannot adapt to small target detection, a small target detection head (STDH) module is introduced to fuse the location and appearance feature information of the shallow layers of the backbone network. Secondly, Inner-Wise intersection over union (Inner-WIoU) is designed as the boundary box regression loss, and auxiliary boundary calculation is used to accelerate the regression speed of the model. Thirdly, the structure of multi-scale feature pyramid network (MS-FPN) can effectively combine the shallow network information with the deep network information and improve the performance of the detection model. Furthermore, a multi-scale cross-spatial attention (MCSA) module is proposed to expand the feature space through multi-scale branch, and then achieves the aggregation of target features through cross-spatial interaction, which improves the ability of the model to extract target features. Finally, the experimental results show that our model does not only possess fewer parameters, but also the values of mAP<sub>0.5</sub> are 6.6% and 5.4% higher than the baseline model on the Visdrone2019 validation dataset and test dataset, respectively. Then, as a conclusion, the M-YOLOv8s model achieves better detection performance than some existing ones, indicating that our proposed method can be more suitable for detecting the small targets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142315099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-complexity content-aware encoding optimization of batch video 批量视频的低复杂度内容感知编码优化
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-17 DOI: 10.1016/j.jvcir.2024.104295
{"title":"Low-complexity content-aware encoding optimization of batch video","authors":"","doi":"10.1016/j.jvcir.2024.104295","DOIUrl":"10.1016/j.jvcir.2024.104295","url":null,"abstract":"<div><p>With the proliferation of short-form video traffic, video service providers are faced with the challenge of balancing video quality and bandwidth consumption while processing massive volumes of videos. The most straightforward and simplistic approach is to set uniformly encoding parameters to all videos. However, such an approach fails to consider the differences in video content, and there may be alternative encoding parameter configuration approach that can improve global coding efficiency. Finding the optimal combination of encoding parameter configurations for a batch of videos requires an amount of redundant encoding, thereby introducing significant computational costs. To address this issue, we propose a low-complexity encoding parameter prediction model that can adaptively adjust the values of the encoding parameters based on video content. The experiments show that when only changing the value of the encoding parameter CRF, our prediction model can achieve 27.04%, 6.11%, and 15.92% bit saving in terms of PSNR, SSIM, and VMAF respectively, while maintaining an acceptable complexity compared to the approach using the same CRF value.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging occupancy map to accelerate video-based point cloud compression 利用占用图加速基于视频的点云压缩
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-16 DOI: 10.1016/j.jvcir.2024.104292
{"title":"Leveraging occupancy map to accelerate video-based point cloud compression","authors":"","doi":"10.1016/j.jvcir.2024.104292","DOIUrl":"10.1016/j.jvcir.2024.104292","url":null,"abstract":"<div><p>Video-based Point Cloud Compression enables point cloud streaming over the internet by converting dynamic 3D point clouds to 2D geometry and attribute videos, which are then compressed using 2D video codecs like H.266/VVC. However, the complex encoding process of H.266/VVC, such as the quadtree with nested multi-type tree (QTMT) partition, greatly hinders the practical application of V-PCC. To address this issue, we propose a fast CU partition method dedicated to V-PCC to accelerate the coding process. Specifically, we classify coding units (CUs) of projected images into three categories based on the occupancy map of a point cloud: unoccupied, partially occupied, and fully occupied. Subsequently, we employ either statistic-based rules or machine-learning models to manage the partition of each category. For unoccupied CUs, we terminate the partition directly; for partially occupied CUs with explicit directions, we selectively skip certain partition candidates; for the remaining CUs (partially occupied CUs with complex directions and fully occupied CUs), we train an edge-driven LightGBM model to predict the partition probability of each partition candidate automatically. Only partitions with high probabilities are retained for further Rate–Distortion (R–D) decisions. Comprehensive experiments demonstrate the superior performance of our proposed method: under the V-PCC common test conditions, our method reduces encoding time by 52% and 44% in geometry and attribute, respectively, while incurring only 0.68% (0.66%) BD-Rate loss in D1 (D2) measurements and 0.79% (luma) BD-Rate loss in attribute, significantly surpassing state-of-the-art works.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SR4KVQA: Video quality assessment database and metric for 4K super-resolution SR4KVQA:用于 4K 超分辨率的视频质量评估数据库和衡量标准
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-14 DOI: 10.1016/j.jvcir.2024.104290
{"title":"SR4KVQA: Video quality assessment database and metric for 4K super-resolution","authors":"","doi":"10.1016/j.jvcir.2024.104290","DOIUrl":"10.1016/j.jvcir.2024.104290","url":null,"abstract":"<div><p>The quality assessment for 4K super-resolution (SR) videos can be conducive to the optimization of video SR algorithms. To improve the subjective and objective consistency of the SR quality assessment, a 4K video database and a blind metric are proposed in this paper. In the database SR4KVQA, there are 30 4K pristine videos, from which 600 SR 4K distorted videos with mean opinion score (MOS) labels are generated by three classic interpolation methods, six SR algorithms based on the deep neural network (DNN), and two SR algorithms based on the generative adversarial network (GAN). The benchmark experiment of the proposed database indicates that video quality assessment (VQA) of the 4K SR videos is challenging for the existing metrics. Among those metrics, the Video-Swin-Transformer backbone demonstrates tremendous potential in the VQA task. Accordingly, a blind VQA metric based on the Video-Swin-Transformer backbone is established, where the normalized loss function and optimized spatio-temporal sampling strategy are applied. The experiment result manifests that the Pearson linear correlation coefficient (PLCC) and Spearman rank-order correlation coefficient (SROCC) of the proposed metric reach 0.8011 and 0.8275 respectively on the SR4KVQA database, which outperforms or competes with the state-of-the-art VQA metrics. The database and the code proposed in this paper are available in the GitHub repository, <span><span>https://github.com/AlexReadyNico/SR4KVQA</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data compensation and feature fusion for sketch based person retrieval 基于素描的人物检索的数据补偿和特征融合
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-12 DOI: 10.1016/j.jvcir.2024.104287
{"title":"Data compensation and feature fusion for sketch based person retrieval","authors":"","doi":"10.1016/j.jvcir.2024.104287","DOIUrl":"10.1016/j.jvcir.2024.104287","url":null,"abstract":"<div><div>Sketch re-identification (Re-ID) aims to retrieve pedestrian photo in the gallery dataset by a query sketch drawn by professionals. The sketch Re-ID task has not been adequately studied because collecting such sketches is difficult and expensive. In addition, the significant modality difference between sketches and images makes extracting the discriminative feature information difficult. To address above issues, we introduce a novel sketch-style pedestrian dataset named Pseudo-Sketch dataset. Our proposed dataset maximizes the utilization of the existing person dataset resources and is freely available, thus effectively reducing the expenses associated with the training and deployment phases. Furthermore, to mitigate the modality gap between sketches and visible images, a cross-modal feature fusion network is proposed that incorporates information from each modality. Experiment results show that the proposed Pseudo-Sketch dataset can effectively complement the real sketch dataset, and the proposed network obtains competitive results than SOTA methods. The dataset will be released later.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信