SSF-Net：用于高光谱目标跟踪的光谱角感知空间-光谱融合网络

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-29 DOI:10.1109/TIP.2025.3572812

Hanzheng Wang;Wei Li;Xiang-Gen Xia;Qian Du;Jing Tian

{"title":"SSF-Net：用于高光谱目标跟踪的光谱角感知空间-光谱融合网络","authors":"Hanzheng Wang;Wei Li;Xiang-Gen Xia;Qian Du;Jing Tian","doi":"10.1109/TIP.2025.3572812","DOIUrl":null,"url":null,"abstract":"Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously, making it highly suitable for handling challenges such as background clutter and visual similarity in object tracking. However, existing methods primarily focus on band regrouping and rely on RGB trackers for feature extraction, resulting in limited exploration of spectral information and difficulties in achieving complementary representations of object features. In this paper, a spatial-spectral fusion network with spectral angle awareness (SSF-Net) is proposed for hyperspectral (HS) object tracking. Firstly, to address the issue of insufficient spectral feature extraction in existing networks, a spatial-spectral feature backbone (<inline-formula> <tex-math>$S^{2}$ </tex-math></inline-formula>FB) is designed. With the spatial and spectral extraction branch, a joint representation of texture and spectrum is obtained. Secondly, a spectral attention fusion module (SAFM) is presented to capture the intra- and inter-modality correlation to obtain the fused features from the HS and RGB modalities. It can incorporate the visual information into the HS context to form a robust representation. Thirdly, to ensure a more accurate response to the object position, a spectral angle awareness module (SAAM) is designed to investigate the region-level spectral similarity between the template and search images during the prediction stage. Furthermore, a novel spectral angle awareness loss (SAAL) is developed to offer guidance for the SAAM based on similar regions. Finally, to obtain the robust tracking results, a weighted prediction method is considered to combine the HS and RGB predicted motions of objects to leverage the strengths of each modality. Extensive experiments on the HOTC-2020, HOTC-2024, and BihoT datasets demonstrate the effectiveness of the proposed SSF-Net compared with state-of-the-art trackers. The source code will be available at <uri>https://github.com/hzwyhc/hsvt</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3518-3532"},"PeriodicalIF":13.7000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SSF-Net: Spatial-Spectral Fusion Network With Spectral Angle Awareness for Hyperspectral Object Tracking\",\"authors\":\"Hanzheng Wang;Wei Li;Xiang-Gen Xia;Qian Du;Jing Tian\",\"doi\":\"10.1109/TIP.2025.3572812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously, making it highly suitable for handling challenges such as background clutter and visual similarity in object tracking. However, existing methods primarily focus on band regrouping and rely on RGB trackers for feature extraction, resulting in limited exploration of spectral information and difficulties in achieving complementary representations of object features. In this paper, a spatial-spectral fusion network with spectral angle awareness (SSF-Net) is proposed for hyperspectral (HS) object tracking. Firstly, to address the issue of insufficient spectral feature extraction in existing networks, a spatial-spectral feature backbone (<inline-formula> <tex-math>$S^{2}$ </tex-math></inline-formula>FB) is designed. With the spatial and spectral extraction branch, a joint representation of texture and spectrum is obtained. Secondly, a spectral attention fusion module (SAFM) is presented to capture the intra- and inter-modality correlation to obtain the fused features from the HS and RGB modalities. It can incorporate the visual information into the HS context to form a robust representation. Thirdly, to ensure a more accurate response to the object position, a spectral angle awareness module (SAAM) is designed to investigate the region-level spectral similarity between the template and search images during the prediction stage. Furthermore, a novel spectral angle awareness loss (SAAL) is developed to offer guidance for the SAAM based on similar regions. Finally, to obtain the robust tracking results, a weighted prediction method is considered to combine the HS and RGB predicted motions of objects to leverage the strengths of each modality. Extensive experiments on the HOTC-2020, HOTC-2024, and BihoT datasets demonstrate the effectiveness of the proposed SSF-Net compared with state-of-the-art trackers. The source code will be available at <uri>https://github.com/hzwyhc/hsvt</uri>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"3518-3532\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11018216/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11018216/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

高光谱视频（HSV）同时提供有价值的空间、光谱和时间信息，使其非常适合处理目标跟踪中的背景杂波和视觉相似性等挑战。然而，现有方法主要侧重于波段重组，并依赖RGB跟踪器进行特征提取，导致对光谱信息的探索有限，难以实现目标特征的互补表示。提出了一种具有光谱角感知的高光谱目标跟踪空间-光谱融合网络（SSF-Net）。首先，针对现有网络中频谱特征提取不足的问题，设计了空间-频谱特征主干（$S^{2}$ FB）。通过空间和光谱提取分支，得到纹理和光谱的联合表示。其次，提出了一个光谱关注融合模块（SAFM）来捕获模态内和模态间的相关性，从而获得HS和RGB模态的融合特征；它可以将视觉信息整合到HS上下文中，形成一个鲁棒的表示。第三，为了保证更准确地响应目标位置，设计了光谱角度感知模块（SAAM），在预测阶段研究模板与搜索图像之间的区域级光谱相似度。此外，提出了一种新的光谱角感知损失（SAAL）方法，为基于相似区域的SAAM提供制导。最后，为了获得稳健的跟踪结果，考虑了一种加权预测方法，将HS和RGB预测的目标运动结合起来，利用每种模态的优势。在HOTC-2020、HOTC-2024和BihoT数据集上进行的大量实验表明，与最先进的跟踪器相比，所提出的SSF-Net是有效的。源代码可从https://github.com/hzwyhc/hsvt获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SSF-Net: Spatial-Spectral Fusion Network With Spectral Angle Awareness for Hyperspectral Object Tracking

Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously, making it highly suitable for handling challenges such as background clutter and visual similarity in object tracking. However, existing methods primarily focus on band regrouping and rely on RGB trackers for feature extraction, resulting in limited exploration of spectral information and difficulties in achieving complementary representations of object features. In this paper, a spatial-spectral fusion network with spectral angle awareness (SSF-Net) is proposed for hyperspectral (HS) object tracking. Firstly, to address the issue of insufficient spectral feature extraction in existing networks, a spatial-spectral feature backbone (

$S^{2}$

FB) is designed. With the spatial and spectral extraction branch, a joint representation of texture and spectrum is obtained. Secondly, a spectral attention fusion module (SAFM) is presented to capture the intra- and inter-modality correlation to obtain the fused features from the HS and RGB modalities. It can incorporate the visual information into the HS context to form a robust representation. Thirdly, to ensure a more accurate response to the object position, a spectral angle awareness module (SAAM) is designed to investigate the region-level spectral similarity between the template and search images during the prediction stage. Furthermore, a novel spectral angle awareness loss (SAAL) is developed to offer guidance for the SAAM based on similar regions. Finally, to obtain the robust tracking results, a weighted prediction method is considered to combine the HS and RGB predicted motions of objects to leverage the strengths of each modality. Extensive experiments on the HOTC-2020, HOTC-2024, and BihoT datasets demonstrate the effectiveness of the proposed SSF-Net compared with state-of-the-art trackers. The source code will be available at https://github.com/hzwyhc/hsvt

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量