Chen Cui , Li Li , Jianfeng Lu , Shanqing Zhang , Chin-Chen Chang
{"title":"A novel high-fidelity reversible data hiding scheme based on multi-classification pixel value ordering","authors":"Chen Cui , Li Li , Jianfeng Lu , Shanqing Zhang , Chin-Chen Chang","doi":"10.1016/j.jvcir.2025.104473","DOIUrl":"10.1016/j.jvcir.2025.104473","url":null,"abstract":"<div><div>Pixel value ordering (PVO) is a highly effective technique that employs a pixel block partitioning and sorting for reversible data hiding (RDH). However, its embedding performance is significantly impacted by block size. To address this, an improved pixel-based PVO (IPPVO) was developed adopting a per-pixel approach and adaptive context size. Nevertheless, IPPVO only considers pixels below and to the right for prediction, neglecting other closer neighboring regions, leading to inaccurate predictions. This study presents a RDH strategy using multi-classification embedding to enhance performance. First, pixels are categorized into four classes based on parity coordinates, obtaining higher correlation prediction values using an adaptive nearest neighbor content size. Second, a new complexity calculation method is introduced, the complexity frequency of pixel regions to better differentiate between complex and flat regions. Finally, an effective embedding ratio and index value constraint are introduced to mitigate the challenge of excessive distortion when embedding large capacities. Experimental results indicate that the proposed scheme offers superior embedding capacity with low distortion compared to state-of-the-art PVO-based RDH methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104473"},"PeriodicalIF":2.6,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shichang Fu , Tao Lu , Jiaming Wang , Yu Gu , Jiayi Cai , Kui Jiang
{"title":"GDPS: A general distillation architecture for end-to-end person search","authors":"Shichang Fu , Tao Lu , Jiaming Wang , Yu Gu , Jiayi Cai , Kui Jiang","doi":"10.1016/j.jvcir.2025.104468","DOIUrl":"10.1016/j.jvcir.2025.104468","url":null,"abstract":"<div><div>Existing knowledge distillation methods for person search tasks handle detection and re-identification (re-id) tasks separately, which may lead to feature conflicts between the two subtasks. On the one hand, by distilling only the detection task, the network will focus more on the common features of pedestrians, which may affect the performance of re-id. On the other hand, by distilling only the re-id task, the network will be more inclined to focus on the personality characteristics of pedestrians, which may harm the detection performance. To solve this problem, we propose a novel distillation method for person search tasks, treating person search as a single task and distilling different tasks in a unified framework, which is called <strong>G</strong>eneral <strong>D</strong>istillation for <strong>P</strong>erson <strong>S</strong>earch (GDPS). Specifically, we optimize the general features of detection and re-id by distilling feature-based knowledge, aiming for accurate localization of individuals. In addition, we focus on the re-id task and perform relationship-based and response-based knowledge distillation to obtain more discriminative person features. Finally, we integrate feature-based, relation-based and response-based knowledge into a general framework to achieve simultaneous distillation of two sub-tasks, which can be readily applied to various end-to-end person search methods. Extensive experiments demonstrate the effectiveness of GDPS across different one-step person search methods. Specifically, AlginPS with ResNet-50 achieves 94.1% in mAP with GDPS on the CUHK-SYSU dataset, which surpasses the baseline 93.1% by 1.0%, and is even better than the ResNet-50 DCN-based teacher model with 94.0% mAP.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104468"},"PeriodicalIF":2.6,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143922576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wanwei Huang , Xuan Xie , Yu Chen , Baotu Wang , Jian Chen , Pingping Chen
{"title":"Low-complexity AV1 intra prediction algorithm","authors":"Wanwei Huang , Xuan Xie , Yu Chen , Baotu Wang , Jian Chen , Pingping Chen","doi":"10.1016/j.jvcir.2025.104464","DOIUrl":"10.1016/j.jvcir.2025.104464","url":null,"abstract":"<div><div>As a new-generation video coding standard, Alliance for Open Media Video 1 (AV1) introduces flexible and diverse block partition types to improve coding efficiency, but also increases coding complexity. To address this issue, we propose a low-complexity AV1 intra prediction algorithm using Long-edge Sparse Sampling (LSS) and Chroma Migrating from Luma (CML) for efficiently encoding video sequences. First, we develop an LSS method by selecting key reference pixels based on block partition condition to reduce computational complexity. Second, we exploit a CML algorithm which combines the angle mode of the luma component and the spatial correlations of chroma components to derive more accurate linear model parameters between the luma and chroma components. Experimental results show that LSS avoids division operations, reducing 93% of addition operations. Combined with CML, our approach saves 4.97% time and enhances coding performance compared to standard AV1, particularly improving chroma component quality.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104464"},"PeriodicalIF":2.6,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contrastive Deep Supervision Meets self-knowledge distillation","authors":"Weiwei Zhang , Peng Liang , Jianqing Zhu , Junhuang Wang","doi":"10.1016/j.jvcir.2025.104470","DOIUrl":"10.1016/j.jvcir.2025.104470","url":null,"abstract":"<div><div>Self-knowledge distillation (Self-KD) creates teacher–student pairs within the network to enhance performance. However, existing Self-KD methods focus solely on task-related knowledge, neglecting the importance of task-unrelated knowledge crucial for the intermediate layer’s learning. To address this, we propose Contrastive Deep Supervision Meets Self-Knowledge Distillation (CDSKD), a technique enabling the learning of task-unrelated knowledge to aid network training. CDSKD initially incorporates an auxiliary classifier into the neural network for Self-KD. Subsequently, an attention module is introduced before the auxiliary classifier’s feature extractor to fortify original features, facilitating extraction and classification. A projection head follows the extractor, and the auxiliary classifier is trained using contrastive loss to acquire task-unrelated knowledge, i.e., the invariance of diverse data augmentation, thereby boosting the network’s overall performance. Numerous experimental results on six datasets and eight networks have shown that CDSKD outperforms other deep supervision and Self-KD methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104470"},"PeriodicalIF":2.6,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143928778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MCANet: Feature pyramid network with multi-scale convolutional attention and aggregation mechanisms for semantic segmentation","authors":"Shuo Hu , Xingwang Tao , Xingmiao Zhao","doi":"10.1016/j.jvcir.2025.104466","DOIUrl":"10.1016/j.jvcir.2025.104466","url":null,"abstract":"<div><div>Feature Pyramid Network (FPN) is an important structure for achieving feature fusion in semantic segmentation networks. However, most current FPN-based methods suffer from insufficient capture of cross-scale long-range information and exhibit aliasing effects during cross-scale fusion. In this paper, we propose the Multi-Scale Convolutional Attention and Aggregation Mechanisms Feature Pyramid Network (MAFPN). We first construct a Context Information Enhancement Module, which provides multi-scale global feature information for different levels through a adaptive aggregation Multi-Scale Convolutional Attention Module (AMSCAM). This approach alleviates the problem of insufficient cross-scale semantic information caused by top-down feature fusion. Furthermore, we propose a feature aggregation mechanism that promotes semantic alignment through a Lightweight Convolutional Attention Module (LFAM), thus enhancing the overall effectiveness of information fusion. Finally, we employ a lightweight self-attention mechanism to capture global long-range dependencies. MCANet is a Transformer-based encoder–decoder architecture, where the encoder adopts Uniformer and Biformer in separate configurations, and the decoder consists of MAFPN and FPN heads. When using Biformer as the encoder, MCANet achieves 49.98% mIoU on the ADE20K dataset and 80.95% and 80.45% mIoU on the Cityscapes validation and test sets, respectively. With Uniformer as the encoder, it attains 48.69% mIoU on ADE20K.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104466"},"PeriodicalIF":2.6,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143916738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Hassan Khan, Muhammad Ahtisham Javed, Muhammad Shahid Farid
{"title":"Deep-learning-based ConvLSTM and LRCN networks for human activity recognition","authors":"Muhammad Hassan Khan, Muhammad Ahtisham Javed, Muhammad Shahid Farid","doi":"10.1016/j.jvcir.2025.104469","DOIUrl":"10.1016/j.jvcir.2025.104469","url":null,"abstract":"<div><div>Human activity recognition (HAR) has received significant research attention lately due to its numerous applications in automated systems such as human-behavior assessment, visual surveillance, healthcare, and entertainment. The objective of a vision-based HAR system is to understand human behavior in video data and determine the action being performed. This paper presents two end-to-end deep networks for human activity recognition, one based on the Convolutional Long Short Term Memory (ConvLSTM) and the other based on Long-term Recurrent Convolution Network (LRCN). The ConvLSTM (Shi et al., 2015) network exploits convolutions that help to extract spatial features considering their temporal correlations (i.e., spatiotemporal prediction). The LRCN (Donahue et al., 2015) fuses the advantages of simple convolution layers and LSTM layers into a single model to adequately encode the spatiotemporal data. Usually, the CNN and LSTM models are used independently: the CNN is used to separate the spatial information from the frames in the first phase. The characteristics gathered by CNN can later be used by the LSTM model to anticipate the video’s action. Rather than building two separate networks and making the whole process computationally inexpensive, we proposed a single LRCN-based network that binds CNN and LSTM layers together into a single model. Additionally, the TimeDistributed layer was introduced in the network which plays a vital role in the encoding of action videos and achieving the highest recognition accuracy. A side contribution of the paper is the evaluation of different convolutional neural network variants including 2D-CNN, and 3D-CNN, for human action recognition. An extensive experimental evaluation of the proposed deep network is carried out on three large benchmark action datasets: UCF50, HMDB51, and UCF-101 action datasets. The results reveal the effectiveness of the proposed algorithms; particularly, our LRCN-based algorithm outperformed the current state-of-the-art, achieving the highest recognition accuracy of 97.42% on UCF50, 73.63% on HMDB51, and 95.70% UCF101 datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104469"},"PeriodicalIF":2.6,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"STAD-ConvBi-LSTM: Spatio-temporal attention-based deep convolutional Bi-LSTM framework for abnormal activity recognition","authors":"Roshni Singh, Abhilasha Sharma","doi":"10.1016/j.jvcir.2025.104465","DOIUrl":"10.1016/j.jvcir.2025.104465","url":null,"abstract":"<div><div>Human Activity Recognition has become significant research in computer vision. Real-time systems analyze the actions to endlessly monitor and recognize abnormal activities, thereby enlightening public security and surveillance measures in real-world. However, implementing these frameworks is a challenging task due to miscellaneous actions, complex patterns, fluctuating viewpoints or background cluttering. Recognizing abnormality in videos still needs exclusive focus for accurate prediction and computational efficiency. To address these challenges, this work introduced an efficient novel spatial–temporal attention-based deep convolutional bidirectional long short-term memory framework. Also, proposes a dual attentional convolutional neural network that combines CNN model, bidirectional-LSTM and spatial–temporal attention mechanism to extract human-centric prominent features in video-clips. The result of extensive experimental analysis exhibits that STAD-ConvBi-LSTM outperforms the state-of-the-art methods using five challenging datasets, namely UCF50, UCF101, YouTube-Action, HMDB51, Kinetics-600 and on our Synthesized Action dataset achieving notable accuracies of 98.8%, 98.1%, 81.2%, 97.4%, 88.2% and 96.7%, respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104465"},"PeriodicalIF":2.6,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143886254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Chen , Hongxia Wang , Yunhe Cui , Guowei Shen , Chun Guo , Yong Liu , Hanzhou Wu
{"title":"A multip-level additive distortion method for security improvement in palette image steganography","authors":"Yi Chen , Hongxia Wang , Yunhe Cui , Guowei Shen , Chun Guo , Yong Liu , Hanzhou Wu","doi":"10.1016/j.jvcir.2025.104463","DOIUrl":"10.1016/j.jvcir.2025.104463","url":null,"abstract":"<div><div>With the rapid development of the Internet and communication technology, palette images have become a preferred media for steganography. However, the security of palette image steganography faces a big problem. To address this, we propose a multiple-level additive distortion method for security improvement in palette image steganography. The proposed multiple-level additive distortion method comprises an index-level cost method and a pixel-level cost method. The index-level and the pixel-level costs by the two methods can respectively reflect the relationship changes of adjacent indices and the pixels corresponding to the adjacent indices. Meanwhile, the index-level and the pixel-level costs can also reflect the modification impact of steganography. Therefore, the proposed method can improve the security of palette image steganography. We conducted extensive experiments on three datasets to verify the security improvement. Experiment results have shown our proposed multiple-level distortion method indeed has an advantage in security when compared with four state-of-the-art methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104463"},"PeriodicalIF":2.6,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DFTGL: Domain Filtered and Target Guided Learning for few-shot anomaly detection","authors":"Jiajun Zhang, Yanzhi Song, Zhouwang Yang","doi":"10.1016/j.jvcir.2025.104457","DOIUrl":"10.1016/j.jvcir.2025.104457","url":null,"abstract":"<div><div>This paper addresses cross-domain challenges in few-shot anomaly detection, where utilizing various source domains leads to diminished representations and compromised detection in the target domain. To tackle this, we propose Domain Filtering and Target-Guided Learning (DFTGL). Initially, we measure domain gaps and retain source domains with smaller disparities. We introduce a limited number of target domain samples to create an intermediate domain for better feature transfer during training. Additionally, we employ category-prior-based augmentation to refine feature distribution estimation while ensuring image registration. Experimental results demonstrate significant improvements in image-level AUROC compared to the baseline: 5.1%, 5.9%, and 4.1% (2-shot, 4-shot, and 8-shot settings) on MVTec and 6.9%, 2.1%, and 2.5% on ViSA datasets. This pioneering research effectively narrows domain gaps, enabling proficient feature transfer, and holds promise for early anomaly detection in industries like product inspection and medical diagnostics.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104457"},"PeriodicalIF":2.6,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143911482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Wang , Xin Cheng , Hao Wu , Xiangyang Luo , Bin Ma , Hui Zong , Jiawei Zhang , Jinwei Wang
{"title":"A GAN-based anti-forensics method by modifying the quantization table in JPEG header file","authors":"Hao Wang , Xin Cheng , Hao Wu , Xiangyang Luo , Bin Ma , Hui Zong , Jiawei Zhang , Jinwei Wang","doi":"10.1016/j.jvcir.2025.104462","DOIUrl":"10.1016/j.jvcir.2025.104462","url":null,"abstract":"<div><div>It is crucial to detect double JPEG compression images in digital image forensics. When detecting recompressed images, most detection methods assume that the quantization table in the JPEG header is safe. The method fails once the quantization table in the header file is tampered with. Inspired by this phenomenon, this paper proposes a double JPEG compression anti-detection method based on the generative adversarial network (GAN) by modifying the quantization table of JPEG header files. The proposed method draws on the structure of GAN to modify the quantization table by gradient descent. Also, our proposed method introduces adversarial loss to determine the direction of the modification so that the modified quantization table can be used for cheat detection methods. The proposed method achieves the aim of anti-detection and only needs to replace the original quantization table after the net training. Experiments show that the proposed method has a high anti-detection rate and generates images with high visual quality.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104462"},"PeriodicalIF":2.6,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}