{"title":"Learning disruptor-suppressed response variation-aware multi-regularized correlation filter for visual tracking","authors":"Sathishkumar Moorthy , Sachin Sakthi K.S. , Sathiyamoorthi Arthanari , Jae Hoon Jeong , Young Hoon Joo","doi":"10.1016/j.jvcir.2025.104458","DOIUrl":"10.1016/j.jvcir.2025.104458","url":null,"abstract":"<div><div>Discriminative correlation filters (DCF) are widely used in object tracking for their high accuracy and computational efficiency. However, conventional DCF methods, which rely only on consecutive frames, often lack robustness due to limited temporal information and can suffer from noise introduced by historical frames. To address these limitations, we propose a novel disruptor-suppressed response variation-aware multi-regularized tracking (DSRVMRT) method. This approach improves tracking stability by incorporating historical interval information in filter training, thus leveraging a broader temporal context. Our method includes response deviation regularization to maintain consistent response quality and introduces a receptive channel weight distribution to enhance channel reliability. Additionally, we implement a disruptor-aware scheme using response bucketing, which detects and penalizes areas affected by similar objects or partial occlusions, reducing tracking disruptions. Extensive evaluations on public tracking benchmarks demonstrate that DSRVMRT achieves superior accuracy, robustness, and effectiveness compared to existing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104458"},"PeriodicalIF":2.6,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huairui Wang , Nianxiang Fu , Zhenzhong Chen , Shan Liu
{"title":"Dynamic kernel-based adaptive spatial aggregation for learned image compression","authors":"Huairui Wang , Nianxiang Fu , Zhenzhong Chen , Shan Liu","doi":"10.1016/j.jvcir.2025.104456","DOIUrl":"10.1016/j.jvcir.2025.104456","url":null,"abstract":"<div><div>Learned image compression methods have shown remarkable performance and expansion potential compared to traditional codecs. Currently, there are two mainstream image compression frameworks: one uses stacked convolution and other uses window-based self-attention for transform coding, most of which aggregate valuable dependencies in a fixed spatial range. In this paper, we focus on extending content-adaptive aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valuable information with dynamic sampling convolution to help transform. With the adaptive aggregation strategy and the sharing weights mechanism, our method can achieve promising transform capability with acceptable model complexity. Besides, considering the coarse hyper prior, the channel-wise, and the spatial context, we formulate a generalized entropy model. Based on it, we introduce dynamic kernel in hyper-prior to generate more expressive side information context. Furthermore, we propose an asymmetric sparse entropy model according to the investigation of the spatial and variance characteristics of the grouped latents. The proposed entropy model can facilitate entropy coding to reduce statistical redundancy while maintaining inference efficiency. Experimental results demonstrate that our method achieves superior rate–distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104456"},"PeriodicalIF":2.6,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143844645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rotation translation matrices analysis method for action recognition of construction equipment","authors":"Ziwei Liu, Jiazhong Yu, Rundong Cao, Qinghua Liang, Shui Liu, Linsu Shi","doi":"10.1016/j.jvcir.2025.104460","DOIUrl":"10.1016/j.jvcir.2025.104460","url":null,"abstract":"<div><div>Lack of intelligence is one of the primary factors hindering the implementation of video surveillance. Action recognition is a method used to enhance the effectiveness of video surveillance and has garnered the interest of numerous researchers. In recent years, the advancement of deep learning (DL) frameworks has led to the proposal of numerous DL-based action recognition models. However, most of these models exhibit poor performance in recognizing actions of construction equipment, primarily due to the presence of multiple targets in complex real-life scenes. Considering the above information, we have developed a method for action recognition that involves analyzing the motion of the fundamental components of construction vehicles. Firstly, we estimate the essential components of construction vehicles from the video inputs using an instance segmentation method. Secondly, to assess the motion state of the robotic arm of the equipment, we have developed an analysis method based on rotation and translation (RT) matrices. We propose to examine the relationship between action recognition of construction vehicles and RT matrices. The evaluations of the respective datasets were conducted. The experimental results validate the effectiveness of the proposed framework, and our model demonstrates state-of-the-art performance in action recognition of construction equipment. We utilize RT matrices to model the degrees of movement of construction equipment, allowing us to analyze their actions and providing a unique perspective on action recognition. We believe that the proposed framework can facilitate the transition of video surveillance techniques from research to practical applications, ultimately generating economic value.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104460"},"PeriodicalIF":2.6,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143838917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contrastive attention and fine-grained feature fusion for artistic style transfer","authors":"Honggang Zhao , Beinan Zhang , Yi-Jun Yang","doi":"10.1016/j.jvcir.2025.104451","DOIUrl":"10.1016/j.jvcir.2025.104451","url":null,"abstract":"<div><div>In contemporary image processing, creative image alteration plays a crucial role. Recent studies on style transfer have utilized attention mechanisms to capture the aesthetic and artistic expressions of style images. This method converts style images into tokens by initially assessing attention levels and subsequently employing a decoder to transfer the artistic style of the image. However, this approach often discards many fine-grained style elements due to the low semantic similarity between the original and style images. This may result in discordant or conspicuous artifacts. We propose MccSTN, an innovative framework for style representation and transfer, designed to adapt to contemporary arbitrary image style transfers as a solution to this problem. Specifically, we introduce the Mccformer feature fusion module, which integrates fine-grained features from content images with aesthetic characteristics from style images. Mccformer is utilized to generate feature maps. The target image is then produced by inputting the feature map into the decoder. We consider the relationship between individual styles and the overall style distribution to streamline the model and enhance training efficiency. We present a multi-scale augmented contrast module that leverages a substantial number of image pairs to learn style representations. Code will be posted on <span><span>https://github.com/haizhu12/MccSTN</span><svg><path></path></svg></span></div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104451"},"PeriodicalIF":2.6,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143838918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Xie, Zixi Li, Shenping Xiong, Zhaoyang Liu, Tijian Cai
{"title":"MemFlow-AD: An anomaly detection and localization model based on memory module and normalizing flow","authors":"Xin Xie, Zixi Li, Shenping Xiong, Zhaoyang Liu, Tijian Cai","doi":"10.1016/j.jvcir.2025.104454","DOIUrl":"10.1016/j.jvcir.2025.104454","url":null,"abstract":"<div><div>Presently, in most anomaly detection methods, the training dataset contains a low frequency of anomalous data with diverse categories. However, these methods exhibit limited learning capacity for anomalous information, leading to weak model generalization and low detection accuracy. This paper proposes MemFlow-AD, an anomaly detection and localization model that integrates a memory module and a normalizing flow. MemFlow-AD supplements anomalous data using anomaly simulation, retains general patterns from normal samples via the memory module to discern potential differences between normal and anomalous samples, employs a 2D normalizing flow to extract distributional feature information from the data, and through multiscale feature fusion and attention mechanism to further enhance the feature expression ability of the model. Experimental results demonstrate outstanding performance in detecting and localizing anomalies on the MVTec dataset, achieving accuracies of 98.61% and 94.02%, respectively. Moreover, on the BTAD dataset, the model exhibits a 2.15% improvement in detection accuracy compared to current mainstream methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104454"},"PeriodicalIF":2.6,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143815162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhao Jin , Tian He , Liping Qiao , Jiang Duan , Xinyu Shi , Bohan Yan , Chen Guo
{"title":"MES-YOLO: An efficient lightweight maritime search and rescue object detection algorithm with improved feature fusion pyramid network","authors":"Zhao Jin , Tian He , Liping Qiao , Jiang Duan , Xinyu Shi , Bohan Yan , Chen Guo","doi":"10.1016/j.jvcir.2025.104453","DOIUrl":"10.1016/j.jvcir.2025.104453","url":null,"abstract":"<div><div>Maritime Search and Rescue (SAR) object detection is challenged by environmental complexity, variability in object scales, and real-time computation constraints of Unmanned Aerial Vehicles (UAVs). Our MES-YOLO algorithm, designed for maritime UAV imagery, employs an innovative Multi Asymptotic Feature Pyramid Network (MAFPN) to enhance detection accuracy across scales. It integrates an Efficient Module (EMO) and Inverted Residual Mobile Blocks (iRMB) to maintain a lightweight model while enhancing key information perception.The SIoU loss function is used to optimize the detection performance of the model. Tests on the SeaDronesSee dataset show that MES-YOLO increased average precision (mAP50) from 81.5% to 87.1%, reduced parameter count by 43.3%, and improved the F1 score by 6.8%, with a model size only 58.3% that of YOLOv8, surpassing YOLO series and other mainstream algorithms in robustness to background illumination and imaging angles.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104453"},"PeriodicalIF":2.6,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143808707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reversible data hiding with automatic contrast enhancement and high embedding capacity based on multi-type histogram modification","authors":"Libo Han , Wanlin Gao , Xinfeng Zhang , Sha Tao","doi":"10.1016/j.jvcir.2025.104450","DOIUrl":"10.1016/j.jvcir.2025.104450","url":null,"abstract":"<div><div>For an image, we can use reversible data hiding (RDH) with automatic contrast enhancement (ACE) to automatically improve its contrast by continuously embedding data. Some existing methods may make the detailed information in the dark regions of the grayscale image not well presented. Furthermore, these methods sometimes suffer from low embedding capacity (EC). Therefore, we propose an RDH method with ACE and high EC based on multi-type histogram modification. A pixel value histogram modification method is proposed to improve the contrast automatically. In this method, two-sided histogram expansion is used to improve global contrast, and then the histogram right-shift method is used to enhance the dark regions. Then, a prediction error histogram modification method is proposed to improve the EC. In this method, a new prediction method is proposed to better improve the EC. Experiment results show that compared with some advanced methods, the proposed method performs better.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104450"},"PeriodicalIF":2.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143768971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingmo Chen , Zhang Wang , Zhouyan He , Ting Luo , Jiangtao Huang
{"title":"Multi-level cross-modal attention guided DIBR 3D image watermarking","authors":"Qingmo Chen , Zhang Wang , Zhouyan He , Ting Luo , Jiangtao Huang","doi":"10.1016/j.jvcir.2025.104455","DOIUrl":"10.1016/j.jvcir.2025.104455","url":null,"abstract":"<div><div>For depth-image-based rendering (DIBR) 3D images, both center and synthesized virtual views are subject to illegal distribution during transmission. To address the issue of copyright protection of DIBR 3D images, we propose a multi-level cross-modal attention guided network (MCANet) for 3D image watermarking. To optimize the watermark embedding process, the watermark adjustment module (WAM) is designed to extract cross-modal information at different scales, thereby calculating 3D image attention to adjust the watermark distribution. Furthermore, the nested dual output U-net (NDOU) is devised to enhance the compensatory capability of the skip connections, thus providing an effective global feature to the up-sampling process for high image quality. Compared to state-of-the-art (SOTA) 3D image watermarking methods, the proposed watermarking model shows superior performance in terms of robustness and imperceptibility.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104455"},"PeriodicalIF":2.6,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143746691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haodong Fan, Dingyi Zhang, Yunlong Yu, Yingming Li
{"title":"LDINet: Latent decomposition-interpolation for single image fast-moving objects deblatting","authors":"Haodong Fan, Dingyi Zhang, Yunlong Yu, Yingming Li","doi":"10.1016/j.jvcir.2025.104439","DOIUrl":"10.1016/j.jvcir.2025.104439","url":null,"abstract":"<div><div>The image of fast-moving objects (FMOs) usually contains a blur stripe indicating the blurred object that is mixed with the background. In this work we propose a novel Latent Decomposition-Interpolation Network (LDINet) to generate the appearances and shapes of the objects from the blurry stripe contained in the single image. In particular, we introduce an Decomposition-Interpolation Module (DIM) to break down the feature maps of the inputs into discrete time indexed parts and interpolate the target latent frames according to the provided time indexes with affine transformations, where the features are categorized into the scalar-like and gradient-like parts when warping in the interpolation. Finally, a decoder renders the prediction results. In addition, based on the results, a Refining Conditional Deblatting (RCD) approach is presented to further enhance the fidelity. Extensive experiments are conducted and have shown that the proposed methods achieve superior performances compared to the existing competing methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104439"},"PeriodicalIF":2.6,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143725251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenxiao Cai, Ke Jin, Jinyan Hou, Cong Guo, Letian Wu, Wankou Yang
{"title":"VDD: Varied Drone Dataset for semantic segmentation","authors":"Wenxiao Cai, Ke Jin, Jinyan Hou, Cong Guo, Letian Wu, Wankou Yang","doi":"10.1016/j.jvcir.2025.104429","DOIUrl":"10.1016/j.jvcir.2025.104429","url":null,"abstract":"<div><div>Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential semantic details to understand scenes on the ground. Ensuring high accuracy of semantic segmentation models for drones requires access to diverse, large-scale, and high-resolution datasets, which are often scarce in the field of aerial image processing. While existing datasets typically focus on urban scenes and are relatively small, our Varied Drone Dataset (VDD) addresses these limitations by offering a large-scale, densely labeled collection of 400 high-resolution images spanning 7 classes. This dataset features various scenes in urban, industrial, rural, and natural areas, captured from different camera angles and under diverse lighting conditions. We also make new annotations to UDD (Chen et al., 2018) and UAVid (Lyu et al., 2018), integrating them under VDD annotation standards, to create the Integrated Drone Dataset (IDD). We train seven state-of-the-art models on drone datasets as baselines. It is expected that our dataset will generate considerable interest in drone image segmentation and serve as a foundation for other drone vision tasks. Datasets are publicly available at <span><span>https://github.com/RussRobin/VDD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"109 ","pages":"Article 104429"},"PeriodicalIF":2.6,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143739111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}