Yuzeng Chen;Qiangqiang Yuan;Yi Xiao;Yuqi Tang;Jiang He;Te Han
{"title":"STAR: A Unified Spatiotemporal Fusion Framework for Satellite Video Object Tracking","authors":"Yuzeng Chen;Qiangqiang Yuan;Yi Xiao;Yuqi Tang;Jiang He;Te Han","doi":"10.1109/TGRS.2025.3585112","DOIUrl":null,"url":null,"abstract":"Satellite video object tracking (SVOT) delivers comprehensive spatiotemporal insights for Earth surface observation, yet existing SVOT methods confront several critical challenges, including data scarcity, modality restrictions, paradigm gaps, and underutilization of multidimensional features, sealing the performance ceiling. This study proposes STAR, a unified spatiotemporal fusion framework for SVOT, mitigating these issues. To optimize satellite video (SV) scenes, STAR first introduces a scene enhancement module (SEM) for generating enhanced multimodal representations. Then, the extraction-correlation-adaptation module (ECAM) is designed, incorporating a multimodal hierarchical Transformer architecture with local and unified relation modeling, which jointly achieves feature extraction, relation learning, and domain adaptation. In addition, the temporal decoding structure is introduced to integrate deep temporal features via attention propagation. Finally, the inertial navigation module (INM) models physical temporal features, including an awareness selector to assess the tracking confidence-uncertainty and an inertial navigation scheme to manage anomalous interferences and continuous trajectory. Inspired by the prompt learning pattern, STAR introduces a minimal number of tunable parameters yet achieves competitive performance across various SVOT benchmarks. Implementation details and evaluation results are available at <uri>https://github.com/YZCU/STAR</uri>","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-22"},"PeriodicalIF":8.6000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11063306/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Satellite video object tracking (SVOT) delivers comprehensive spatiotemporal insights for Earth surface observation, yet existing SVOT methods confront several critical challenges, including data scarcity, modality restrictions, paradigm gaps, and underutilization of multidimensional features, sealing the performance ceiling. This study proposes STAR, a unified spatiotemporal fusion framework for SVOT, mitigating these issues. To optimize satellite video (SV) scenes, STAR first introduces a scene enhancement module (SEM) for generating enhanced multimodal representations. Then, the extraction-correlation-adaptation module (ECAM) is designed, incorporating a multimodal hierarchical Transformer architecture with local and unified relation modeling, which jointly achieves feature extraction, relation learning, and domain adaptation. In addition, the temporal decoding structure is introduced to integrate deep temporal features via attention propagation. Finally, the inertial navigation module (INM) models physical temporal features, including an awareness selector to assess the tracking confidence-uncertainty and an inertial navigation scheme to manage anomalous interferences and continuous trajectory. Inspired by the prompt learning pattern, STAR introduces a minimal number of tunable parameters yet achieves competitive performance across various SVOT benchmarks. Implementation details and evaluation results are available at https://github.com/YZCU/STAR
期刊介绍:
IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.