视觉目标跟踪的时空上下文自适应框架

IF 2.2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IET Image Processing Pub Date : 2025-07-16 DOI:10.1049/ipr2.70150

Kunlong Zhao, Dawei Zhao, Xu Wang, Liang Xiao, Yulong Huang, Yiming Nie, Yonggang Zhang, Bin Dai

{"title":"视觉目标跟踪的时空上下文自适应框架","authors":"Kunlong Zhao, Dawei Zhao, Xu Wang, Liang Xiao, Yulong Huang, Yiming Nie, Yonggang Zhang, Bin Dai","doi":"10.1049/ipr2.70150","DOIUrl":null,"url":null,"abstract":"<p>Visual object tracking is widely applied in intelligent transportation systems and visual surveillance systems that serve smart cities, as well as in autonomous vehicles. Existing methods usually utilise a relation-modelling framework to model the visual object tracking problem, with auxiliary spatial context and temporal information. The spatial context is often extracted by enlarging the target template, which can introduce more background and positional information. The temporal correlation is obtained by associating the search image with previous images. However, due to noise interference, existing methods often partially exploit auxiliary data, leading to underutilisation of spatiotemporal information. To address these issues, we propose a novel and concise tracking framework, uniformly encoding all auxiliary data, including the enlarged target template, previous images, and corresponding target bounding boxes. Specifically, to mitigate the unstable factors introduced by these raw inputs, we propose a spatiotemporal context adaptive encoder, which can adaptively select appropriate information in noisy data. Extensive experiments show that the proposed method achieves state-of-the-art performance on various benchmarks, demonstrating its superiority.</p>","PeriodicalId":56303,"journal":{"name":"IET Image Processing","volume":"19 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70150","citationCount":"0","resultStr":"{\"title\":\"Spatiotemporal Context Adapting Framework for Visual Object Tracking\",\"authors\":\"Kunlong Zhao, Dawei Zhao, Xu Wang, Liang Xiao, Yulong Huang, Yiming Nie, Yonggang Zhang, Bin Dai\",\"doi\":\"10.1049/ipr2.70150\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Visual object tracking is widely applied in intelligent transportation systems and visual surveillance systems that serve smart cities, as well as in autonomous vehicles. Existing methods usually utilise a relation-modelling framework to model the visual object tracking problem, with auxiliary spatial context and temporal information. The spatial context is often extracted by enlarging the target template, which can introduce more background and positional information. The temporal correlation is obtained by associating the search image with previous images. However, due to noise interference, existing methods often partially exploit auxiliary data, leading to underutilisation of spatiotemporal information. To address these issues, we propose a novel and concise tracking framework, uniformly encoding all auxiliary data, including the enlarged target template, previous images, and corresponding target bounding boxes. Specifically, to mitigate the unstable factors introduced by these raw inputs, we propose a spatiotemporal context adaptive encoder, which can adaptively select appropriate information in noisy data. Extensive experiments show that the proposed method achieves state-of-the-art performance on various benchmarks, demonstrating its superiority.</p>\",\"PeriodicalId\":56303,\"journal\":{\"name\":\"IET Image Processing\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ipr2.70150\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Image Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70150\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Image Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ipr2.70150","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

视觉目标跟踪广泛应用于智能交通系统和服务于智慧城市的视觉监控系统，以及自动驾驶汽车。现有的方法通常利用关系建模框架来模拟视觉目标跟踪问题，并辅以空间上下文和时间信息。空间上下文通常是通过放大目标模板来提取的，这样可以引入更多的背景和位置信息。时间相关性是通过将搜索图像与之前的图像相关联来获得的。然而，由于噪声干扰，现有方法往往部分利用辅助数据，导致时空信息利用率不足。为了解决这些问题，我们提出了一种新颖而简洁的跟踪框架，对所有辅助数据进行统一编码，包括放大后的目标模板、之前的图像和相应的目标边界框。具体来说，为了减轻这些原始输入带来的不稳定因素，我们提出了一种时空上下文自适应编码器，它可以自适应地从噪声数据中选择合适的信息。大量的实验表明，该方法在各种基准测试中达到了最先进的性能，证明了其优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Spatiotemporal Context Adapting Framework for Visual Object Tracking

查看原文本刊更多论文

Spatiotemporal Context Adapting Framework for Visual Object Tracking

Visual object tracking is widely applied in intelligent transportation systems and visual surveillance systems that serve smart cities, as well as in autonomous vehicles. Existing methods usually utilise a relation-modelling framework to model the visual object tracking problem, with auxiliary spatial context and temporal information. The spatial context is often extracted by enlarging the target template, which can introduce more background and positional information. The temporal correlation is obtained by associating the search image with previous images. However, due to noise interference, existing methods often partially exploit auxiliary data, leading to underutilisation of spatiotemporal information. To address these issues, we propose a novel and concise tracking framework, uniformly encoding all auxiliary data, including the enlarged target template, previous images, and corresponding target bounding boxes. Specifically, to mitigate the unstable factors introduced by these raw inputs, we propose a spatiotemporal context adaptive encoder, which can adaptively select appropriate information in noisy data. Extensive experiments show that the proposed method achieves state-of-the-art performance on various benchmarks, demonstrating its superiority.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IET Image Processing 工程技术-工程：电子与电气

CiteScore

5.40

自引率

8.70%

发文量

282

审稿时长

6 months

期刊介绍： The IET Image Processing journal encompasses research areas related to the generation, processing and communication of visual information. The focus of the journal is the coverage of the latest research results in image and video processing, including image generation and display, enhancement and restoration, segmentation, colour and texture analysis, coding and communication, implementations and architectures as well as innovative applications. Principal topics include: Generation and Display - Imaging sensors and acquisition systems, illumination, sampling and scanning, quantization, colour reproduction, image rendering, display and printing systems, evaluation of image quality. Processing and Analysis - Image enhancement, restoration, segmentation, registration, multispectral, colour and texture processing, multiresolution processing and wavelets, morphological operations, stereoscopic and 3-D processing, motion detection and estimation, video and image sequence processing. Implementations and Architectures - Image and video processing hardware and software, design and construction, architectures and software, neural, adaptive, and fuzzy processing. Coding and Transmission - Image and video compression and coding, compression standards, noise modelling, visual information networks, streamed video. Retrieval and Multimedia - Storage of images and video, database design, image retrieval, video annotation and editing, mixed media incorporating visual information, multimedia systems and applications, image and video watermarking, steganography. Applications - Innovative application of image and video processing technologies to any field, including life sciences, earth sciences, astronomy, document processing and security. Current Special Issue Call for Papers: Evolutionary Computation for Image Processing - https://digital-library.theiet.org/files/IET_IPR_CFP_EC.pdf AI-Powered 3D Vision - https://digital-library.theiet.org/files/IET_IPR_CFP_AIPV.pdf Multidisciplinary advancement of Imaging Technologies: From Medical Diagnostics and Genomics to Cognitive Machine Vision, and Artificial Intelligence - https://digital-library.theiet.org/files/IET_IPR_CFP_IST.pdf Deep Learning for 3D Reconstruction - https://digital-library.theiet.org/files/IET_IPR_CFP_DLR.pdf