Journal of Visual Communication and Image Representation最新文献

筛选
英文 中文
Deep-learning-based ConvLSTM and LRCN networks for human activity recognition 基于深度学习的ConvLSTM和LRCN网络用于人类活动识别
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-04-28 DOI: 10.1016/j.jvcir.2025.104469
Muhammad Hassan Khan, Muhammad Ahtisham Javed, Muhammad Shahid Farid
{"title":"Deep-learning-based ConvLSTM and LRCN networks for human activity recognition","authors":"Muhammad Hassan Khan,&nbsp;Muhammad Ahtisham Javed,&nbsp;Muhammad Shahid Farid","doi":"10.1016/j.jvcir.2025.104469","DOIUrl":"10.1016/j.jvcir.2025.104469","url":null,"abstract":"<div><div>Human activity recognition (HAR) has received significant research attention lately due to its numerous applications in automated systems such as human-behavior assessment, visual surveillance, healthcare, and entertainment. The objective of a vision-based HAR system is to understand human behavior in video data and determine the action being performed. This paper presents two end-to-end deep networks for human activity recognition, one based on the Convolutional Long Short Term Memory (ConvLSTM) and the other based on Long-term Recurrent Convolution Network (LRCN). The ConvLSTM (Shi et al., 2015) network exploits convolutions that help to extract spatial features considering their temporal correlations (i.e., spatiotemporal prediction). The LRCN (Donahue et al., 2015) fuses the advantages of simple convolution layers and LSTM layers into a single model to adequately encode the spatiotemporal data. Usually, the CNN and LSTM models are used independently: the CNN is used to separate the spatial information from the frames in the first phase. The characteristics gathered by CNN can later be used by the LSTM model to anticipate the video’s action. Rather than building two separate networks and making the whole process computationally inexpensive, we proposed a single LRCN-based network that binds CNN and LSTM layers together into a single model. Additionally, the TimeDistributed layer was introduced in the network which plays a vital role in the encoding of action videos and achieving the highest recognition accuracy. A side contribution of the paper is the evaluation of different convolutional neural network variants including 2D-CNN, and 3D-CNN, for human action recognition. An extensive experimental evaluation of the proposed deep network is carried out on three large benchmark action datasets: UCF50, HMDB51, and UCF-101 action datasets. The results reveal the effectiveness of the proposed algorithms; particularly, our LRCN-based algorithm outperformed the current state-of-the-art, achieving the highest recognition accuracy of 97.42% on UCF50, 73.63% on HMDB51, and 95.70% UCF101 datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104469"},"PeriodicalIF":2.6,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143895834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STAD-ConvBi-LSTM: Spatio-temporal attention-based deep convolutional Bi-LSTM framework for abnormal activity recognition STAD-ConvBi-LSTM:基于时空注意的深度卷积Bi-LSTM异常活动识别框架
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-04-28 DOI: 10.1016/j.jvcir.2025.104465
Roshni Singh, Abhilasha Sharma
{"title":"STAD-ConvBi-LSTM: Spatio-temporal attention-based deep convolutional Bi-LSTM framework for abnormal activity recognition","authors":"Roshni Singh,&nbsp;Abhilasha Sharma","doi":"10.1016/j.jvcir.2025.104465","DOIUrl":"10.1016/j.jvcir.2025.104465","url":null,"abstract":"<div><div>Human Activity Recognition has become significant research in computer vision. Real-time systems analyze the actions to endlessly monitor and recognize abnormal activities, thereby enlightening public security and surveillance measures in real-world. However, implementing these frameworks is a challenging task due to miscellaneous actions, complex patterns, fluctuating viewpoints or background cluttering. Recognizing abnormality in videos still needs exclusive focus for accurate prediction and computational efficiency. To address these challenges, this work introduced an efficient novel spatial–temporal attention-based deep convolutional bidirectional long short-term memory framework. Also, proposes a dual attentional convolutional neural network that combines CNN model, bidirectional-LSTM and spatial–temporal attention mechanism to extract human-centric prominent features in video-clips. The result of extensive experimental analysis exhibits that STAD-ConvBi-LSTM outperforms the state-of-the-art methods using five challenging datasets, namely UCF50, UCF101, YouTube-Action, HMDB51, Kinetics-600 and on our Synthesized Action dataset achieving notable accuracies of 98.8%, 98.1%, 81.2%, 97.4%, 88.2% and 96.7%, respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104465"},"PeriodicalIF":2.6,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143886254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multip-level additive distortion method for security improvement in palette image steganography 一种提高调色板图像隐写安全性的多级加性失真方法
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-04-25 DOI: 10.1016/j.jvcir.2025.104463
Yi Chen , Hongxia Wang , Yunhe Cui , Guowei Shen , Chun Guo , Yong Liu , Hanzhou Wu
{"title":"A multip-level additive distortion method for security improvement in palette image steganography","authors":"Yi Chen ,&nbsp;Hongxia Wang ,&nbsp;Yunhe Cui ,&nbsp;Guowei Shen ,&nbsp;Chun Guo ,&nbsp;Yong Liu ,&nbsp;Hanzhou Wu","doi":"10.1016/j.jvcir.2025.104463","DOIUrl":"10.1016/j.jvcir.2025.104463","url":null,"abstract":"<div><div>With the rapid development of the Internet and communication technology, palette images have become a preferred media for steganography. However, the security of palette image steganography faces a big problem. To address this, we propose a multiple-level additive distortion method for security improvement in palette image steganography. The proposed multiple-level additive distortion method comprises an index-level cost method and a pixel-level cost method. The index-level and the pixel-level costs by the two methods can respectively reflect the relationship changes of adjacent indices and the pixels corresponding to the adjacent indices. Meanwhile, the index-level and the pixel-level costs can also reflect the modification impact of steganography. Therefore, the proposed method can improve the security of palette image steganography. We conducted extensive experiments on three datasets to verify the security improvement. Experiment results have shown our proposed multiple-level distortion method indeed has an advantage in security when compared with four state-of-the-art methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104463"},"PeriodicalIF":2.6,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A GAN-based anti-forensics method by modifying the quantization table in JPEG header file 一种基于gan的反取证方法,通过修改JPEG头文件中的量化表
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-04-23 DOI: 10.1016/j.jvcir.2025.104462
Hao Wang , Xin Cheng , Hao Wu , Xiangyang Luo , Bin Ma , Hui Zong , Jiawei Zhang , Jinwei Wang
{"title":"A GAN-based anti-forensics method by modifying the quantization table in JPEG header file","authors":"Hao Wang ,&nbsp;Xin Cheng ,&nbsp;Hao Wu ,&nbsp;Xiangyang Luo ,&nbsp;Bin Ma ,&nbsp;Hui Zong ,&nbsp;Jiawei Zhang ,&nbsp;Jinwei Wang","doi":"10.1016/j.jvcir.2025.104462","DOIUrl":"10.1016/j.jvcir.2025.104462","url":null,"abstract":"<div><div>It is crucial to detect double JPEG compression images in digital image forensics. When detecting recompressed images, most detection methods assume that the quantization table in the JPEG header is safe. The method fails once the quantization table in the header file is tampered with. Inspired by this phenomenon, this paper proposes a double JPEG compression anti-detection method based on the generative adversarial network (GAN) by modifying the quantization table of JPEG header files. The proposed method draws on the structure of GAN to modify the quantization table by gradient descent. Also, our proposed method introduces adversarial loss to determine the direction of the modification so that the modified quantization table can be used for cheat detection methods. The proposed method achieves the aim of anti-detection and only needs to replace the original quantization table after the net training. Experiments show that the proposed method has a high anti-detection rate and generates images with high visual quality.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104462"},"PeriodicalIF":2.6,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EPSA-VPR: A lightweight visual place recognition method with an Efficient Patch Saliency-weighted Aggregator EPSA-VPR:一种基于高效斑块显著性加权聚合器的轻量级视觉位置识别方法
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-04-22 DOI: 10.1016/j.jvcir.2025.104440
Jiwei Nie , Qǐxı̄ Zhào , Dingyu Xue , Feng Pan , Wei Liu
{"title":"EPSA-VPR: A lightweight visual place recognition method with an Efficient Patch Saliency-weighted Aggregator","authors":"Jiwei Nie ,&nbsp;Qǐxı̄ Zhào ,&nbsp;Dingyu Xue ,&nbsp;Feng Pan ,&nbsp;Wei Liu","doi":"10.1016/j.jvcir.2025.104440","DOIUrl":"10.1016/j.jvcir.2025.104440","url":null,"abstract":"<div><div>Visual Place Recognition (VPR) is important in autonomous driving, as it enables vehicles to identify their positions using a pre-built database. In this domain, prior research highlights the advantages of recognizing and emphasizing high-saliency local features in descriptor aggregation for performance improvement. Following this path, we introduce EPSA-VPR, a lightweight VPR method incorporating a proposed Efficient Patch Saliency-weighted Aggregator (EPSA), additionally addressing the computational efficiency demands of large-scale scenarios. With almost negligible computational requirements, EPSA efficiently calculates and integrates the local saliency into the global descriptor. To quantitatively evaluate the effectiveness, EPSA-VPR is validated across various VPR benchmarks. The comprehensive evaluations confirm that our method outperforms existing advanced VPR technologies and achieves competitive performance. Notably, EPSA-VPR also derives the second-best performance among two-stage VPR methods, without the need for re-ranking computations. Moreover, the effectiveness of our model is sustainable even with considerable dimension reduction. Visualization analysis reveals the interpretability of EPSA-VPR that after training, the backbone network learns to attach more attention on the task-related elements, which makes the final descriptor more discriminative.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104440"},"PeriodicalIF":2.6,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Opinion-unaware blind quality assessment of AI-generated omnidirectional images based on deep feature statistics 基于深度特征统计的人工智能生成全向图像的无意见盲质量评估
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-04-22 DOI: 10.1016/j.jvcir.2025.104461
Xuelin Liu, Jiebin Yan, Yuming Fang, Jingwen Hou
{"title":"Opinion-unaware blind quality assessment of AI-generated omnidirectional images based on deep feature statistics","authors":"Xuelin Liu,&nbsp;Jiebin Yan,&nbsp;Yuming Fang,&nbsp;Jingwen Hou","doi":"10.1016/j.jvcir.2025.104461","DOIUrl":"10.1016/j.jvcir.2025.104461","url":null,"abstract":"<div><div>The advancement of artificial intelligence generated content (AIGC) and virtual reality (VR) technologies have prompted AI-generated omnidirectional images (AGOI) to gradually into people’s daily lives. Compared to natural omnidirectional images, AGOIs exhibit traditional low-level technical distortions and high-level semantic distortions, which can severely affect the immersive experience for users in practical applications. Consequently, there is an urgent need for thorough research and precise evaluation of AGOI quality. In this paper, we propose a novel opinion-unaware (OU) blind quality assessment approach for AGOIs based on deep feature statistics. Specifically, we first transform the AGOIs in equirectangular projection (ERP) format into a set of six cubemap projection (CMP)-converted viewport images, and extract viewport-wise multi-layer deep features from the pre-trained neural network backbone. Based on the deep representations, the multivariate Gaussian (MVG) models are subsequently fitted. The individual quality score for each CMP-converted image is calculated by comparing it against the corresponding fitted pristine MVG model. The final quality score for a testing AGOI is then computed by aggregating these individual quality scores. We conduct comprehensive experiments using the existing AGOIQA database and the experimental results show that the proposed OU-BAGOIQA model outperforms current state-of-the-art OU blind image quality assessment models. The ablation study has also been conducted to validate the effectiveness of our method.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104461"},"PeriodicalIF":2.6,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143877249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention mechanism based multimodal feature fusion network for human action recognition 基于注意机制的多模态特征融合网络人体动作识别
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-04-21 DOI: 10.1016/j.jvcir.2025.104459
Xu Zhao , Chao Tang , Huosheng Hu , Wenjian Wang , Shuo Qiao , Anyang Tong
{"title":"Attention mechanism based multimodal feature fusion network for human action recognition","authors":"Xu Zhao ,&nbsp;Chao Tang ,&nbsp;Huosheng Hu ,&nbsp;Wenjian Wang ,&nbsp;Shuo Qiao ,&nbsp;Anyang Tong","doi":"10.1016/j.jvcir.2025.104459","DOIUrl":"10.1016/j.jvcir.2025.104459","url":null,"abstract":"<div><div>Current human action recognition (HAR) methods focus on integrating multiple data modalities, such as skeleton data and RGB data. However, they struggle to exploit motion correlation information in skeleton data and rely on spatial representations from RGB modalities. This paper proposes a novel Attention-based Multimodal Feature Integration Network (AMFI-Net) designed to enhance modal fusion and improve recognition accuracy. First, RGB and skeleton data undergo multi-level preprocessing to obtain differential movement representations, which are then input into a heterogeneous network for separate multimodal feature extraction. Next, an adaptive fusion strategy is employed to enhance the integration of these multimodal features. Finally, the network assesses the confidence level of weighted skeleton information to determine the extent and type of appearance information to be used in the final feature integration. Experiments conducted on the NTU-RGB + D dataset demonstrate that the proposed method is feasible, leading to significant improvements in human action recognition accuracy.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104459"},"PeriodicalIF":2.6,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143877349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seg-Cam: Enhancing interpretability analysis in segmentation networks Seg-Cam:增强分割网络的可解释性分析
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-04-21 DOI: 10.1016/j.jvcir.2025.104467
Weihua Wu , Chunming Ye , Xufei Liao
{"title":"Seg-Cam: Enhancing interpretability analysis in segmentation networks","authors":"Weihua Wu ,&nbsp;Chunming Ye ,&nbsp;Xufei Liao","doi":"10.1016/j.jvcir.2025.104467","DOIUrl":"10.1016/j.jvcir.2025.104467","url":null,"abstract":"<div><div>Existing interpretability analysis methods face significant limitations when applied to segmentation networks, such as limited applicability and unclear visualization of weight distribution. To address these issues, a novel approach for calculating network layer weights was established for segmentation networks, such as encoder-decoder networks. Rather than processing individual parameters, this method computes gradients based on pixel-level information. It improves the weight calculation model in the Grad-Cam method by removing the constraint that the model’s output layer must be a one-dimensional vector. This modification extends its applicability beyond traditional CNN classification models to include those that generate feature maps as output, such as segmentation models. It also improves the visualization process by calculating the distribution of feature map weights for the specified layer without changing the model architecture or retraining. Utilizing the image segmentation task as the project context, the seg-cam visualization scheme is incorporated into the initial model. This scheme enables the visualization of parameter weights for each network layer, facilitating post-training analysis and model calibration. This approach enhances the interpretability of segmentation networks, particularly in cases where the head layer contains many parameters, making interpretation challenging.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104467"},"PeriodicalIF":2.6,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hard-UNet architecture for medical image segmentation using position encoding generator: LSA based encoder 基于位置编码生成器的医学图像分割硬网结构:基于LSA的编码器
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-04-17 DOI: 10.1016/j.jvcir.2025.104452
Chia-Jui Chen
{"title":"Hard-UNet architecture for medical image segmentation using position encoding generator: LSA based encoder","authors":"Chia-Jui Chen","doi":"10.1016/j.jvcir.2025.104452","DOIUrl":"10.1016/j.jvcir.2025.104452","url":null,"abstract":"<div><div>Researchers have focused on the rising usage of convolutional neural networks (CNNs) in segmentation, emphasizing the pivotal role of encoders in learning global and local information essential for predictions. The limited ability of CNNs to capture distant spatial relationships due to their local structure has spurred interest in the swin-transformer. Introducing a novel approach named Hard-UNet, blending CNNs and transformers, addresses this gap, inspired by transformer successes in NLP. Hard-UNet leverages HardNet for deep feature extraction and implements a transformer-based module for self-communication within sub-windows. Experimental results demonstrate its significant performance leap over existing methods, notably enhancing segmentation accuracy on medical image datasets like ISIC 2018 and BUSI. Outperforming UNext and ResUNet, Hard-UNet delivers a remarkable 16.24% enhancement in segmentation accuracy, achieving state-of-the-art results of 83.19 % and 83.26 % on the ISIC dataset.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104452"},"PeriodicalIF":2.6,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning disruptor-suppressed response variation-aware multi-regularized correlation filter for visual tracking 用于视觉跟踪的学习干扰抑制响应变差感知多正则化相关滤波器
IF 2.6 4区 计算机科学
Journal of Visual Communication and Image Representation Pub Date : 2025-04-17 DOI: 10.1016/j.jvcir.2025.104458
Sathishkumar Moorthy , Sachin Sakthi K.S. , Sathiyamoorthi Arthanari , Jae Hoon Jeong , Young Hoon Joo
{"title":"Learning disruptor-suppressed response variation-aware multi-regularized correlation filter for visual tracking","authors":"Sathishkumar Moorthy ,&nbsp;Sachin Sakthi K.S. ,&nbsp;Sathiyamoorthi Arthanari ,&nbsp;Jae Hoon Jeong ,&nbsp;Young Hoon Joo","doi":"10.1016/j.jvcir.2025.104458","DOIUrl":"10.1016/j.jvcir.2025.104458","url":null,"abstract":"<div><div>Discriminative correlation filters (DCF) are widely used in object tracking for their high accuracy and computational efficiency. However, conventional DCF methods, which rely only on consecutive frames, often lack robustness due to limited temporal information and can suffer from noise introduced by historical frames. To address these limitations, we propose a novel disruptor-suppressed response variation-aware multi-regularized tracking (DSRVMRT) method. This approach improves tracking stability by incorporating historical interval information in filter training, thus leveraging a broader temporal context. Our method includes response deviation regularization to maintain consistent response quality and introduces a receptive channel weight distribution to enhance channel reliability. Additionally, we implement a disruptor-aware scheme using response bucketing, which detects and penalizes areas affected by similar objects or partial occlusions, reducing tracking disruptions. Extensive evaluations on public tracking benchmarks demonstrate that DSRVMRT achieves superior accuracy, robustness, and effectiveness compared to existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104458"},"PeriodicalIF":2.6,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信