STAR:卫星视频目标跟踪的统一时空融合框架

IF 8.6 1区 地球科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Yuzeng Chen;Qiangqiang Yuan;Yi Xiao;Yuqi Tang;Jiang He;Te Han
{"title":"STAR:卫星视频目标跟踪的统一时空融合框架","authors":"Yuzeng Chen;Qiangqiang Yuan;Yi Xiao;Yuqi Tang;Jiang He;Te Han","doi":"10.1109/TGRS.2025.3585112","DOIUrl":null,"url":null,"abstract":"Satellite video object tracking (SVOT) delivers comprehensive spatiotemporal insights for Earth surface observation, yet existing SVOT methods confront several critical challenges, including data scarcity, modality restrictions, paradigm gaps, and underutilization of multidimensional features, sealing the performance ceiling. This study proposes STAR, a unified spatiotemporal fusion framework for SVOT, mitigating these issues. To optimize satellite video (SV) scenes, STAR first introduces a scene enhancement module (SEM) for generating enhanced multimodal representations. Then, the extraction-correlation-adaptation module (ECAM) is designed, incorporating a multimodal hierarchical Transformer architecture with local and unified relation modeling, which jointly achieves feature extraction, relation learning, and domain adaptation. In addition, the temporal decoding structure is introduced to integrate deep temporal features via attention propagation. Finally, the inertial navigation module (INM) models physical temporal features, including an awareness selector to assess the tracking confidence-uncertainty and an inertial navigation scheme to manage anomalous interferences and continuous trajectory. Inspired by the prompt learning pattern, STAR introduces a minimal number of tunable parameters yet achieves competitive performance across various SVOT benchmarks. Implementation details and evaluation results are available at <uri>https://github.com/YZCU/STAR</uri>","PeriodicalId":13213,"journal":{"name":"IEEE Transactions on Geoscience and Remote Sensing","volume":"63 ","pages":"1-22"},"PeriodicalIF":8.6000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"STAR: A Unified Spatiotemporal Fusion Framework for Satellite Video Object Tracking\",\"authors\":\"Yuzeng Chen;Qiangqiang Yuan;Yi Xiao;Yuqi Tang;Jiang He;Te Han\",\"doi\":\"10.1109/TGRS.2025.3585112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Satellite video object tracking (SVOT) delivers comprehensive spatiotemporal insights for Earth surface observation, yet existing SVOT methods confront several critical challenges, including data scarcity, modality restrictions, paradigm gaps, and underutilization of multidimensional features, sealing the performance ceiling. This study proposes STAR, a unified spatiotemporal fusion framework for SVOT, mitigating these issues. To optimize satellite video (SV) scenes, STAR first introduces a scene enhancement module (SEM) for generating enhanced multimodal representations. Then, the extraction-correlation-adaptation module (ECAM) is designed, incorporating a multimodal hierarchical Transformer architecture with local and unified relation modeling, which jointly achieves feature extraction, relation learning, and domain adaptation. In addition, the temporal decoding structure is introduced to integrate deep temporal features via attention propagation. Finally, the inertial navigation module (INM) models physical temporal features, including an awareness selector to assess the tracking confidence-uncertainty and an inertial navigation scheme to manage anomalous interferences and continuous trajectory. Inspired by the prompt learning pattern, STAR introduces a minimal number of tunable parameters yet achieves competitive performance across various SVOT benchmarks. Implementation details and evaluation results are available at <uri>https://github.com/YZCU/STAR</uri>\",\"PeriodicalId\":13213,\"journal\":{\"name\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"volume\":\"63 \",\"pages\":\"1-22\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Geoscience and Remote Sensing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11063306/\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Geoscience and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11063306/","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

卫星视频目标跟踪(SVOT)为地球表面观测提供了全面的时空洞察,但现有的SVOT方法面临着几个关键的挑战,包括数据稀缺、模态限制、范式差距和多维特征的利用不足,从而限制了性能上限。本研究提出了一个统一的SVOT时空融合框架STAR,以缓解这些问题。为了优化卫星视频(SV)场景,STAR首先引入了一个场景增强模块(SEM),用于生成增强的多模态表示。然后,设计了提取-关联-自适应模块(ECAM),将多模态分层Transformer架构与局部统一关系建模相结合,共同实现特征提取、关系学习和领域自适应;此外,引入时间解码结构,通过注意传播整合深度时间特征。最后,惯性导航模块(INM)对物理时间特征进行建模,包括用于评估跟踪置信度和不确定性的感知选择器以及用于管理异常干扰和连续轨迹的惯性导航方案。受到快速学习模式的启发,STAR引入了最少数量的可调参数,但在各种SVOT基准测试中实现了具有竞争力的性能。实施细节和评估结果可在https://github.com/YZCU/STAR上获得
本文章由计算机程序翻译,如有差异,请以英文原文为准。
STAR: A Unified Spatiotemporal Fusion Framework for Satellite Video Object Tracking
Satellite video object tracking (SVOT) delivers comprehensive spatiotemporal insights for Earth surface observation, yet existing SVOT methods confront several critical challenges, including data scarcity, modality restrictions, paradigm gaps, and underutilization of multidimensional features, sealing the performance ceiling. This study proposes STAR, a unified spatiotemporal fusion framework for SVOT, mitigating these issues. To optimize satellite video (SV) scenes, STAR first introduces a scene enhancement module (SEM) for generating enhanced multimodal representations. Then, the extraction-correlation-adaptation module (ECAM) is designed, incorporating a multimodal hierarchical Transformer architecture with local and unified relation modeling, which jointly achieves feature extraction, relation learning, and domain adaptation. In addition, the temporal decoding structure is introduced to integrate deep temporal features via attention propagation. Finally, the inertial navigation module (INM) models physical temporal features, including an awareness selector to assess the tracking confidence-uncertainty and an inertial navigation scheme to manage anomalous interferences and continuous trajectory. Inspired by the prompt learning pattern, STAR introduces a minimal number of tunable parameters yet achieves competitive performance across various SVOT benchmarks. Implementation details and evaluation results are available at https://github.com/YZCU/STAR
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Geoscience and Remote Sensing
IEEE Transactions on Geoscience and Remote Sensing 工程技术-地球化学与地球物理
CiteScore
11.50
自引率
28.00%
发文量
1912
审稿时长
4.0 months
期刊介绍: IEEE Transactions on Geoscience and Remote Sensing (TGRS) is a monthly publication that focuses on the theory, concepts, and techniques of science and engineering as applied to sensing the land, oceans, atmosphere, and space; and the processing, interpretation, and dissemination of this information.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信