Journal of Visual Communication and Image Representation最新文献

筛选
英文 中文
Iterative decoupling deconvolution network for image restoration 用于图像复原的迭代去耦解卷积网络
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-12 DOI: 10.1016/j.jvcir.2024.104288
{"title":"Iterative decoupling deconvolution network for image restoration","authors":"","doi":"10.1016/j.jvcir.2024.104288","DOIUrl":"10.1016/j.jvcir.2024.104288","url":null,"abstract":"<div><p>The iterative decoupled deblurring BM3D (IDDBM3D) (Danielyan et al., 2011) combines the analysis representation and the synthesis representation, where deblurring and denoising operations are decoupled, so that both problems can be easily solved. However, the IDDBM3D has some limitations. First, the analysis transformation and the synthesis transformation are analytical, thus have limited representation ability. Second, it is difficult to effectively remove image noise from threshold transformation. Third, there exists hyper-parameters to be tuned manually, which is difficult and time consuming. In this work, we propose an iterative decoupling deconvolution network(IDDNet), by unrolling the iterative decoupling algorithm of the IDDBM3D. In the proposed IDDNet, the analysis/synthesis transformation are implemented by encoder/decoder modules; the denoising is implemented by convolutional neural network based denoiser; the hyper-parameters are estimated by hyper-parameter module. We apply our models for image deblurring and super-resolution. Experimental results show that the IDDNet significantly outperforms the state-of-the-art unfolding networks.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142230612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LG-AKD: Application of a lightweight GCN model based on adversarial knowledge distillation to skeleton action recognition LG-AKD:基于对抗知识提炼的轻量级 GCN 模型在骨骼动作识别中的应用
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-12 DOI: 10.1016/j.jvcir.2024.104286
{"title":"LG-AKD: Application of a lightweight GCN model based on adversarial knowledge distillation to skeleton action recognition","authors":"","doi":"10.1016/j.jvcir.2024.104286","DOIUrl":"10.1016/j.jvcir.2024.104286","url":null,"abstract":"<div><p>Human action recognition, a pivotal topic in computer vision, is a highly complex and challenging task. It requires the analysis of not only spatial dependencies of targets but also temporal changes in these targets. In recent decades, the advancement of deep learning has led to the development of numerous action recognition methods based on deep neural networks. Given that the skeleton points of the human body can be treated as a graph structure, graph neural networks (GNNs) have emerged as an effective tool for modeling such data, garnering significant interest from researchers. This paper aims to address the issue of low test speed caused by over-complicated deep graph convolutional models. To achieve this, we compress the network structure using knowledge distillation from a teacher-student architecture, leading to a compact and lightweight student GNN. To enhance the model’s robustness and generalization capabilities, we introduce a data augmentation mechanism that generates diverse action sequences while maintaining consistent behavior labels, thereby providing a more comprehensive learning basis for the model. The proposed model integrates three distinct knowledge learning paths: teacher networks, original datasets, and derived data. The fusion of knowledge distillation and data augmentation enables lightweight student networks to outperform their teacher networks in terms of both performance and efficiency. Experimental results demonstrate the efficacy of our approach in the context of skeleton-based human action recognition, highlighting its potential to simplify state-of-the-art models while enhancing their performance.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142167738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-capacity reversible data hiding in encrypted images based on adaptive block coding selection 基于自适应块编码选择的加密图像中的高容量可逆数据隐藏
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-12 DOI: 10.1016/j.jvcir.2024.104291
{"title":"High-capacity reversible data hiding in encrypted images based on adaptive block coding selection","authors":"","doi":"10.1016/j.jvcir.2024.104291","DOIUrl":"10.1016/j.jvcir.2024.104291","url":null,"abstract":"<div><p>Recently, data hiding techniques have flourished and addressed various challenges. However, reversible data hiding for encrypted images (RDHEI) using vacating room after encryption (VRAE) framework often falls short in terms of data embedding performance. To address this issue, this paper proposes a novel and high-capacity data hiding method based on adaptive block coding selection. Specifically, iterative encryption and block permutation are applied during image encryption to maintain high pixel correlation within blocks. For each block in the encrypted image, both entropy coding and zero-valued high bit-planes compression coding are pre-applied, then the coding method that vacates the most space is selected, leveraging the strengths of both coding techniques to maximize the effective embeddable room of each encrypted block. This adaptive block coding selection mechanism is suitable for images with varying characteristics. Extensive experiments demonstrate that the proposed VRAE-based method outperforms some state-of-the-art RDHEI methods in data embedding capacity. The average embedding rates (ERs) of the proposed method for three publicly-used datasets including BOSSbase, BOWS-2 and UCID, are 4.041 bpp, 3.929 bpp, and 3.181 bpp, respectively.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved semantic-guided network for skeleton-based action recognition 用于基于骨骼的动作识别的改进型语义引导网络
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-02 DOI: 10.1016/j.jvcir.2024.104281
{"title":"Improved semantic-guided network for skeleton-based action recognition","authors":"","doi":"10.1016/j.jvcir.2024.104281","DOIUrl":"10.1016/j.jvcir.2024.104281","url":null,"abstract":"<div><p>A fundamental issue in skeleton-based action recognition is the extraction of useful features from skeleton joints. Unfortunately, the current state-of-the-art models for this task have a tendency to be overly complex and parameterized, which results in low model training and inference time efficiency for large-scale datasets. In this work, we develop a simple but yet an efficient baseline for skeleton-based Human Action Recognition (HAR). The architecture is based on adaptive GCNs (Graph Convolutional Networks) to capture the complex interconnections within skeletal structures automatically without the need of a predefined topology. The GCNs are followed and empowered with an attention mechanism to learn more informative representations. This paper reports interesting accuracy on a large-scale dataset NTU-RGB+D 60, 89.7% and 95.0% on respectively Cross-Subject, and Cross-View benchmarks. On NTU-RGB+D 120, 84.6% and 85.8% over Cross-Subject and Cross-Setup settings, respectively. This work provides an improvement of the existing model SGN (Semantic-Guided Neural Networks) when extracting more discriminant spatial and temporal features.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1047320324002372/pdfft?md5=13e62eaf463a376574412ad44a346dd4&pid=1-s2.0-S1047320324002372-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel approach for long-term secure storage of domain independent videos 一种长期安全存储独立于领域的视频的新方法
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-02 DOI: 10.1016/j.jvcir.2024.104279
{"title":"A novel approach for long-term secure storage of domain independent videos","authors":"","doi":"10.1016/j.jvcir.2024.104279","DOIUrl":"10.1016/j.jvcir.2024.104279","url":null,"abstract":"<div><p>Long-term protection of multimedia contents is a complex task, especially when the video has critical elements. It demands sophisticated technology to ensure confidentiality. In this paper, we propose a blended approach which uses proactive visual cryptography scheme along with video summarization techniques to circumvent the aforementioned issues. Proactive visual cryptography is used to protect digital data by updating periodically or renewing the shares, which are stored in different servers. And, video summarization schemes are useful in various scenarios where memory is a major concern. We use a domain independent scheme for summarizing videos and is applicable to both edited and unedited videos. In our scheme, the visual continuity of the raw video is preserved even after summarization. The original video can be reconstructed through the shares using auxiliary data, which was generated during video summarization phase. The mathematical studies and experimental results demonstrate the applicability of our proposed method.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142163278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VTPL: Visual and text prompt learning for visual-language models VTPL:视觉语言模型的视觉和文本提示学习
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-02 DOI: 10.1016/j.jvcir.2024.104280
{"title":"VTPL: Visual and text prompt learning for visual-language models","authors":"","doi":"10.1016/j.jvcir.2024.104280","DOIUrl":"10.1016/j.jvcir.2024.104280","url":null,"abstract":"<div><p>Visual-language (V-L) models have achieved remarkable success in learning combined visual–textual representations from large web datasets. Prompt learning, as a solution for downstream tasks, can address the forgetting of knowledge associated with fine-tuning. However, current methods focus on a single modality and fail to fully use multimodal information. This paper aims to address these limitations by proposing a novel approach called visual and text prompt learning (VTPL) to train the model and enhance both visual and text prompts. Visual prompts align visual features with text features, whereas text prompts enrich the semantic information of the text. Additionally, this paper introduces a poly-1 information noise contrastive estimation (InfoNCE) loss and a center loss to increase the interclass distance and decrease the intraclass distance. Experiments on 11 image datasets show that VTPL outperforms state-of-the-art methods, achieving 1.61%, 1.63%, 1.99%, 2.42%, and 2.87% performance boosts over CoOp for 1, 2, 4, 8, and 16 shots, respectively.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAFLFusionGait: Gait recognition network with separate attention and different granularity feature learnability fusion SAFLFusionGait:具有独立注意力和不同粒度特征可学性融合的步态识别网络
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-09-01 DOI: 10.1016/j.jvcir.2024.104284
{"title":"SAFLFusionGait: Gait recognition network with separate attention and different granularity feature learnability fusion","authors":"","doi":"10.1016/j.jvcir.2024.104284","DOIUrl":"10.1016/j.jvcir.2024.104284","url":null,"abstract":"<div><p>Gait recognition, an essential branch of biometric identification, uses walking patterns to identify individuals. Despite its effectiveness, gait recognition faces challenges such as vulnerability to changes in appearance due to factors like angles and clothing conditions. Recent progress in deep learning has greatly enhanced gait recognition, especially through methods like deep convolutional neural networks, which demonstrate impressive performance. However, current approaches often overlook the connection between coarse-grained and fine-grained features, thereby restricting their overall effectiveness. To address this limitation, we propose a new framework for gait recognition framework that combines deep-supervised fine-grained separation with coarse-grained feature learnability. Our framework includes the LFF module, which consists of the SSeg module for fine-grained information extraction and a mechanism for fusing coarse-grained features. Furthermore, we introduce the F-LCM module to extract local disparity features more effectively with learnable weights. Evaluation on CASIA-B and OU-MVLP datasets shows superior performance compared to classical networks.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blind image deblurring with a difference of the mixed anisotropic and mixed isotropic total variation regularization 采用混合各向异性和混合各向同性总变化正则化差异进行盲图像去模糊处理
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-08-31 DOI: 10.1016/j.jvcir.2024.104285
{"title":"Blind image deblurring with a difference of the mixed anisotropic and mixed isotropic total variation regularization","authors":"","doi":"10.1016/j.jvcir.2024.104285","DOIUrl":"10.1016/j.jvcir.2024.104285","url":null,"abstract":"<div><p>This paper proposes a simple model for image deblurring with a new total variation regularization. Classically, the <em>L</em><sub>1-21</sub> regularizer represents a difference of anisotropic (i.e. <em>L</em><sub>1</sub>) and isotropic (i.e. <em>L</em><sub>21</sub>) total variation, so we define a new regularization as <em>L</em><sub>e-2e</sub>, which is the weighted difference of the mixed anisotropic (i.e. <em>L</em><sub>0</sub> + <em>L</em><sub>1</sub> = <em>L</em><sub>e</sub>) and mixed isotropic (i.e. <em>L</em><sub>0</sub> + <em>L</em><sub>21</sub> = <em>L</em><sub>2e</sub>), and it is characterized by sparsity-promoting<!--> <!-->and robustness in image deblurring. Then, we merge the <em>L</em><sub>0</sub>-gradient into the model for edge-preserving and detail-removing. The union of the <em>L</em><sub>e-2e</sub> regularization and <em>L</em><sub>0</sub>-gradient improves the performance of image deblurring and yields high-quality blur kernel estimates. Finally, we design a new solution format that alternately iterates the difference of convex algorithm, the split Bregman method, and the approach of half-quadratic splitting to optimize the proposed model. Experimental results on quantitative datasets and real-world images show that the proposed method can obtain results comparable to state-of-the-art works.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secret image sharing with distinct covers based on improved Cycling-XOR 基于改进型循环-XOR 的不同封面秘密图像共享
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-08-31 DOI: 10.1016/j.jvcir.2024.104282
{"title":"Secret image sharing with distinct covers based on improved Cycling-XOR","authors":"","doi":"10.1016/j.jvcir.2024.104282","DOIUrl":"10.1016/j.jvcir.2024.104282","url":null,"abstract":"<div><p>Secret image sharing (SIS) is a technique used to distribute confidential data by dividing it into multiple image shadows. Most of the existing approaches or algorithms protect confidential data by encryption with secret keys. This paper proposes a novel SIS scheme without using any secret key. The secret images are first quantized and encrypted by self-encryption into noisy ones. Then, the encrypted images are mixed into secret shares by cross-encryption. The image shadows are generated by replacing the lower bit-planes of the cover images with the secret shares. In the extraction phase, the receiver can restore the quantized secret images by combinatorial operations of the extracted secret shares. Experimental results show that our method is able to deliver a large amount of data payload with a satisfactory cover image quality. Besides, the computational load is very low since the whole scheme is mostly based on cycling-XOR operations.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142129509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Background adaptive PosMarker based on online generation and detection for locating watermarked regions in photographs 基于在线生成和检测的背景自适应 PosMarker,用于定位照片中的水印区域
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2024-08-28 DOI: 10.1016/j.jvcir.2024.104269
{"title":"Background adaptive PosMarker based on online generation and detection for locating watermarked regions in photographs","authors":"","doi":"10.1016/j.jvcir.2024.104269","DOIUrl":"10.1016/j.jvcir.2024.104269","url":null,"abstract":"<div><p>Robust watermarking technology can embed invisible messages in screens to trace the source of unauthorized screen photographs. Locating the four vertices of the embedded region in the photograph is necessary, as existing watermarking methods require geometric correction of the embedded region before revealing the message. Existing localization methods suffer from a performance trade-off: either causing unaesthetic visual quality by embedding visible markers or achieving poor localization precision, leading to message extraction failure. To address this issue, we propose a background adaptive position marker, PosMarker, based on the gray level co-occurrence matrix and the noise visibility function. Besides, we propose an online generation scheme that employs a learnable generator to cooperate with the detector, allowing joint optimization between the two. This simultaneously improves both visual quality and detection precision. Extensive experiments demonstrate the superior localization precision of our PosMarker-based method compared to others.</p></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":null,"pages":null},"PeriodicalIF":2.6,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信