Journal of Visual Communication and Image Representation最新文献_第2页

A novel high-fidelity reversible data hiding scheme based on multi-classification pixel value ordering 一种基于多分类像素值排序的高保真可逆数据隐藏方案

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-05-10 DOI: 10.1016/j.jvcir.2025.104473

Chen Cui , Li Li , Jianfeng Lu , Shanqing Zhang , Chin-Chen Chang

{"title":"A novel high-fidelity reversible data hiding scheme based on multi-classification pixel value ordering","authors":"Chen Cui , Li Li , Jianfeng Lu , Shanqing Zhang , Chin-Chen Chang","doi":"10.1016/j.jvcir.2025.104473","DOIUrl":"10.1016/j.jvcir.2025.104473","url":null,"abstract":"<div><div>Pixel value ordering (PVO) is a highly effective technique that employs a pixel block partitioning and sorting for reversible data hiding (RDH). However, its embedding performance is significantly impacted by block size. To address this, an improved pixel-based PVO (IPPVO) was developed adopting a per-pixel approach and adaptive context size. Nevertheless, IPPVO only considers pixels below and to the right for prediction, neglecting other closer neighboring regions, leading to inaccurate predictions. This study presents a RDH strategy using multi-classification embedding to enhance performance. First, pixels are categorized into four classes based on parity coordinates, obtaining higher correlation prediction values using an adaptive nearest neighbor content size. Second, a new complexity calculation method is introduced, the complexity frequency of pixel regions to better differentiate between complex and flat regions. Finally, an effective embedding ratio and index value constraint are introduced to mitigate the challenge of excessive distortion when embedding large capacities. Experimental results indicate that the proposed scheme offers superior embedding capacity with low distortion compared to state-of-the-art PVO-based RDH methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104473"},"PeriodicalIF":2.6,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GDPS: A general distillation architecture for end-to-end person search GDPS：用于端到端人员搜索的通用蒸馏体系结构

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-05-06 DOI: 10.1016/j.jvcir.2025.104468

Shichang Fu , Tao Lu , Jiaming Wang , Yu Gu , Jiayi Cai , Kui Jiang

{"title":"GDPS: A general distillation architecture for end-to-end person search","authors":"Shichang Fu , Tao Lu , Jiaming Wang , Yu Gu , Jiayi Cai , Kui Jiang","doi":"10.1016/j.jvcir.2025.104468","DOIUrl":"10.1016/j.jvcir.2025.104468","url":null,"abstract":"<div><div>Existing knowledge distillation methods for person search tasks handle detection and re-identification (re-id) tasks separately, which may lead to feature conflicts between the two subtasks. On the one hand, by distilling only the detection task, the network will focus more on the common features of pedestrians, which may affect the performance of re-id. On the other hand, by distilling only the re-id task, the network will be more inclined to focus on the personality characteristics of pedestrians, which may harm the detection performance. To solve this problem, we propose a novel distillation method for person search tasks, treating person search as a single task and distilling different tasks in a unified framework, which is called <strong>G</strong>eneral <strong>D</strong>istillation for <strong>P</strong>erson <strong>S</strong>earch (GDPS). Specifically, we optimize the general features of detection and re-id by distilling feature-based knowledge, aiming for accurate localization of individuals. In addition, we focus on the re-id task and perform relationship-based and response-based knowledge distillation to obtain more discriminative person features. Finally, we integrate feature-based, relation-based and response-based knowledge into a general framework to achieve simultaneous distillation of two sub-tasks, which can be readily applied to various end-to-end person search methods. Extensive experiments demonstrate the effectiveness of GDPS across different one-step person search methods. Specifically, AlginPS with ResNet-50 achieves 94.1% in mAP with GDPS on the CUHK-SYSU dataset, which surpasses the baseline 93.1% by 1.0%, and is even better than the ResNet-50 DCN-based teacher model with 94.0% mAP.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104468"},"PeriodicalIF":2.6,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Low-complexity AV1 intra prediction algorithm 低复杂度AV1帧内预测算法

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-05-05 DOI: 10.1016/j.jvcir.2025.104464

Wanwei Huang , Xuan Xie , Yu Chen , Baotu Wang , Jian Chen , Pingping Chen

引用次数: 0

Contrastive Deep Supervision Meets self-knowledge distillation 对比深度监督满足自我知识提炼

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-05-03 DOI: 10.1016/j.jvcir.2025.104470

Weiwei Zhang , Peng Liang , Jianqing Zhu , Junhuang Wang

{"title":"Contrastive Deep Supervision Meets self-knowledge distillation","authors":"Weiwei Zhang , Peng Liang , Jianqing Zhu , Junhuang Wang","doi":"10.1016/j.jvcir.2025.104470","DOIUrl":"10.1016/j.jvcir.2025.104470","url":null,"abstract":"<div><div>Self-knowledge distillation (Self-KD) creates teacher–student pairs within the network to enhance performance. However, existing Self-KD methods focus solely on task-related knowledge, neglecting the importance of task-unrelated knowledge crucial for the intermediate layer’s learning. To address this, we propose Contrastive Deep Supervision Meets Self-Knowledge Distillation (CDSKD), a technique enabling the learning of task-unrelated knowledge to aid network training. CDSKD initially incorporates an auxiliary classifier into the neural network for Self-KD. Subsequently, an attention module is introduced before the auxiliary classifier’s feature extractor to fortify original features, facilitating extraction and classification. A projection head follows the extractor, and the auxiliary classifier is trained using contrastive loss to acquire task-unrelated knowledge, i.e., the invariance of diverse data augmentation, thereby boosting the network’s overall performance. Numerous experimental results on six datasets and eight networks have shown that CDSKD outperforms other deep supervision and Self-KD methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104470"},"PeriodicalIF":2.6,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143928778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MCANet: Feature pyramid network with multi-scale convolutional attention and aggregation mechanisms for semantic segmentation 基于多尺度卷积关注和聚合机制的特征金字塔网络语义分割

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-05-02 DOI: 10.1016/j.jvcir.2025.104466

Shuo Hu , Xingwang Tao , Xingmiao Zhao

{"title":"MCANet: Feature pyramid network with multi-scale convolutional attention and aggregation mechanisms for semantic segmentation","authors":"Shuo Hu , Xingwang Tao , Xingmiao Zhao","doi":"10.1016/j.jvcir.2025.104466","DOIUrl":"10.1016/j.jvcir.2025.104466","url":null,"abstract":"<div><div>Feature Pyramid Network (FPN) is an important structure for achieving feature fusion in semantic segmentation networks. However, most current FPN-based methods suffer from insufficient capture of cross-scale long-range information and exhibit aliasing effects during cross-scale fusion. In this paper, we propose the Multi-Scale Convolutional Attention and Aggregation Mechanisms Feature Pyramid Network (MAFPN). We first construct a Context Information Enhancement Module, which provides multi-scale global feature information for different levels through a adaptive aggregation Multi-Scale Convolutional Attention Module (AMSCAM). This approach alleviates the problem of insufficient cross-scale semantic information caused by top-down feature fusion. Furthermore, we propose a feature aggregation mechanism that promotes semantic alignment through a Lightweight Convolutional Attention Module (LFAM), thus enhancing the overall effectiveness of information fusion. Finally, we employ a lightweight self-attention mechanism to capture global long-range dependencies. MCANet is a Transformer-based encoder–decoder architecture, where the encoder adopts Uniformer and Biformer in separate configurations, and the decoder consists of MAFPN and FPN heads. When using Biformer as the encoder, MCANet achieves 49.98% mIoU on the ADE20K dataset and 80.95% and 80.45% mIoU on the Cityscapes validation and test sets, respectively. With Uniformer as the encoder, it attains 48.69% mIoU on ADE20K.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104466"},"PeriodicalIF":2.6,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143916738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep-learning-based ConvLSTM and LRCN networks for human activity recognition 基于深度学习的ConvLSTM和LRCN网络用于人类活动识别

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-04-28 DOI: 10.1016/j.jvcir.2025.104469

Muhammad Hassan Khan, Muhammad Ahtisham Javed, Muhammad Shahid Farid

{"title":"Deep-learning-based ConvLSTM and LRCN networks for human activity recognition","authors":"Muhammad Hassan Khan, Muhammad Ahtisham Javed, Muhammad Shahid Farid","doi":"10.1016/j.jvcir.2025.104469","DOIUrl":"10.1016/j.jvcir.2025.104469","url":null,"abstract":"<div><div>Human activity recognition (HAR) has received significant research attention lately due to its numerous applications in automated systems such as human-behavior assessment, visual surveillance, healthcare, and entertainment. The objective of a vision-based HAR system is to understand human behavior in video data and determine the action being performed. This paper presents two end-to-end deep networks for human activity recognition, one based on the Convolutional Long Short Term Memory (ConvLSTM) and the other based on Long-term Recurrent Convolution Network (LRCN). The ConvLSTM (Shi et al., 2015) network exploits convolutions that help to extract spatial features considering their temporal correlations (i.e., spatiotemporal prediction). The LRCN (Donahue et al., 2015) fuses the advantages of simple convolution layers and LSTM layers into a single model to adequately encode the spatiotemporal data. Usually, the CNN and LSTM models are used independently: the CNN is used to separate the spatial information from the frames in the first phase. The characteristics gathered by CNN can later be used by the LSTM model to anticipate the video’s action. Rather than building two separate networks and making the whole process computationally inexpensive, we proposed a single LRCN-based network that binds CNN and LSTM layers together into a single model. Additionally, the TimeDistributed layer was introduced in the network which plays a vital role in the encoding of action videos and achieving the highest recognition accuracy. A side contribution of the paper is the evaluation of different convolutional neural network variants including 2D-CNN, and 3D-CNN, for human action recognition. An extensive experimental evaluation of the proposed deep network is carried out on three large benchmark action datasets: UCF50, HMDB51, and UCF-101 action datasets. The results reveal the effectiveness of the proposed algorithms; particularly, our LRCN-based algorithm outperformed the current state-of-the-art, achieving the highest recognition accuracy of 97.42% on UCF50, 73.63% on HMDB51, and 95.70% UCF101 datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104469"},"PeriodicalIF":2.6,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STAD-ConvBi-LSTM: Spatio-temporal attention-based deep convolutional Bi-LSTM framework for abnormal activity recognition STAD-ConvBi-LSTM：基于时空注意的深度卷积Bi-LSTM异常活动识别框架

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-04-28 DOI: 10.1016/j.jvcir.2025.104465

Roshni Singh, Abhilasha Sharma

{"title":"STAD-ConvBi-LSTM: Spatio-temporal attention-based deep convolutional Bi-LSTM framework for abnormal activity recognition","authors":"Roshni Singh, Abhilasha Sharma","doi":"10.1016/j.jvcir.2025.104465","DOIUrl":"10.1016/j.jvcir.2025.104465","url":null,"abstract":"<div><div>Human Activity Recognition has become significant research in computer vision. Real-time systems analyze the actions to endlessly monitor and recognize abnormal activities, thereby enlightening public security and surveillance measures in real-world. However, implementing these frameworks is a challenging task due to miscellaneous actions, complex patterns, fluctuating viewpoints or background cluttering. Recognizing abnormality in videos still needs exclusive focus for accurate prediction and computational efficiency. To address these challenges, this work introduced an efficient novel spatial–temporal attention-based deep convolutional bidirectional long short-term memory framework. Also, proposes a dual attentional convolutional neural network that combines CNN model, bidirectional-LSTM and spatial–temporal attention mechanism to extract human-centric prominent features in video-clips. The result of extensive experimental analysis exhibits that STAD-ConvBi-LSTM outperforms the state-of-the-art methods using five challenging datasets, namely UCF50, UCF101, YouTube-Action, HMDB51, Kinetics-600 and on our Synthesized Action dataset achieving notable accuracies of 98.8%, 98.1%, 81.2%, 97.4%, 88.2% and 96.7%, respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104465"},"PeriodicalIF":2.6,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143886254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multip-level additive distortion method for security improvement in palette image steganography 一种提高调色板图像隐写安全性的多级加性失真方法

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-04-25 DOI: 10.1016/j.jvcir.2025.104463

Yi Chen , Hongxia Wang , Yunhe Cui , Guowei Shen , Chun Guo , Yong Liu , Hanzhou Wu

{"title":"A multip-level additive distortion method for security improvement in palette image steganography","authors":"Yi Chen , Hongxia Wang , Yunhe Cui , Guowei Shen , Chun Guo , Yong Liu , Hanzhou Wu","doi":"10.1016/j.jvcir.2025.104463","DOIUrl":"10.1016/j.jvcir.2025.104463","url":null,"abstract":"<div><div>With the rapid development of the Internet and communication technology, palette images have become a preferred media for steganography. However, the security of palette image steganography faces a big problem. To address this, we propose a multiple-level additive distortion method for security improvement in palette image steganography. The proposed multiple-level additive distortion method comprises an index-level cost method and a pixel-level cost method. The index-level and the pixel-level costs by the two methods can respectively reflect the relationship changes of adjacent indices and the pixels corresponding to the adjacent indices. Meanwhile, the index-level and the pixel-level costs can also reflect the modification impact of steganography. Therefore, the proposed method can improve the security of palette image steganography. We conducted extensive experiments on three datasets to verify the security improvement. Experiment results have shown our proposed multiple-level distortion method indeed has an advantage in security when compared with four state-of-the-art methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104463"},"PeriodicalIF":2.6,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DFTGL: Domain Filtered and Target Guided Learning for few-shot anomaly detection DFTGL：基于域滤波和目标引导学习的小样本异常检测

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-04-25 DOI: 10.1016/j.jvcir.2025.104457

Jiajun Zhang, Yanzhi Song, Zhouwang Yang

引用次数: 0

A GAN-based anti-forensics method by modifying the quantization table in JPEG header file 一种基于gan的反取证方法，通过修改JPEG头文件中的量化表

IF 2.6 4区计算机科学

Journal of Visual Communication and Image Representation Pub Date : 2025-04-23 DOI: 10.1016/j.jvcir.2025.104462

Hao Wang , Xin Cheng , Hao Wu , Xiangyang Luo , Bin Ma , Hui Zong , Jiawei Zhang , Jinwei Wang

{"title":"A GAN-based anti-forensics method by modifying the quantization table in JPEG header file","authors":"Hao Wang , Xin Cheng , Hao Wu , Xiangyang Luo , Bin Ma , Hui Zong , Jiawei Zhang , Jinwei Wang","doi":"10.1016/j.jvcir.2025.104462","DOIUrl":"10.1016/j.jvcir.2025.104462","url":null,"abstract":"<div><div>It is crucial to detect double JPEG compression images in digital image forensics. When detecting recompressed images, most detection methods assume that the quantization table in the JPEG header is safe. The method fails once the quantization table in the header file is tampered with. Inspired by this phenomenon, this paper proposes a double JPEG compression anti-detection method based on the generative adversarial network (GAN) by modifying the quantization table of JPEG header files. The proposed method draws on the structure of GAN to modify the quantization table by gradient descent. Also, our proposed method introduces adversarial loss to determine the direction of the modification so that the modified quantization table can be used for cheat detection methods. The proposed method achieves the aim of anti-detection and only needs to replace the original quantization table after the net training. Experiments show that the proposed method has a high anti-detection rate and generates images with high visual quality.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104462"},"PeriodicalIF":2.6,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0