{"title":"SSF-Net:用于高光谱目标跟踪的光谱角感知空间-光谱融合网络","authors":"Hanzheng Wang;Wei Li;Xiang-Gen Xia;Qian Du;Jing Tian","doi":"10.1109/TIP.2025.3572812","DOIUrl":null,"url":null,"abstract":"Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously, making it highly suitable for handling challenges such as background clutter and visual similarity in object tracking. However, existing methods primarily focus on band regrouping and rely on RGB trackers for feature extraction, resulting in limited exploration of spectral information and difficulties in achieving complementary representations of object features. In this paper, a spatial-spectral fusion network with spectral angle awareness (SSF-Net) is proposed for hyperspectral (HS) object tracking. Firstly, to address the issue of insufficient spectral feature extraction in existing networks, a spatial-spectral feature backbone (<inline-formula> <tex-math>$S^{2}$ </tex-math></inline-formula>FB) is designed. With the spatial and spectral extraction branch, a joint representation of texture and spectrum is obtained. Secondly, a spectral attention fusion module (SAFM) is presented to capture the intra- and inter-modality correlation to obtain the fused features from the HS and RGB modalities. It can incorporate the visual information into the HS context to form a robust representation. Thirdly, to ensure a more accurate response to the object position, a spectral angle awareness module (SAAM) is designed to investigate the region-level spectral similarity between the template and search images during the prediction stage. Furthermore, a novel spectral angle awareness loss (SAAL) is developed to offer guidance for the SAAM based on similar regions. Finally, to obtain the robust tracking results, a weighted prediction method is considered to combine the HS and RGB predicted motions of objects to leverage the strengths of each modality. Extensive experiments on the HOTC-2020, HOTC-2024, and BihoT datasets demonstrate the effectiveness of the proposed SSF-Net compared with state-of-the-art trackers. The source code will be available at <uri>https://github.com/hzwyhc/hsvt</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3518-3532"},"PeriodicalIF":13.7000,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SSF-Net: Spatial-Spectral Fusion Network With Spectral Angle Awareness for Hyperspectral Object Tracking\",\"authors\":\"Hanzheng Wang;Wei Li;Xiang-Gen Xia;Qian Du;Jing Tian\",\"doi\":\"10.1109/TIP.2025.3572812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously, making it highly suitable for handling challenges such as background clutter and visual similarity in object tracking. However, existing methods primarily focus on band regrouping and rely on RGB trackers for feature extraction, resulting in limited exploration of spectral information and difficulties in achieving complementary representations of object features. In this paper, a spatial-spectral fusion network with spectral angle awareness (SSF-Net) is proposed for hyperspectral (HS) object tracking. Firstly, to address the issue of insufficient spectral feature extraction in existing networks, a spatial-spectral feature backbone (<inline-formula> <tex-math>$S^{2}$ </tex-math></inline-formula>FB) is designed. With the spatial and spectral extraction branch, a joint representation of texture and spectrum is obtained. Secondly, a spectral attention fusion module (SAFM) is presented to capture the intra- and inter-modality correlation to obtain the fused features from the HS and RGB modalities. It can incorporate the visual information into the HS context to form a robust representation. Thirdly, to ensure a more accurate response to the object position, a spectral angle awareness module (SAAM) is designed to investigate the region-level spectral similarity between the template and search images during the prediction stage. Furthermore, a novel spectral angle awareness loss (SAAL) is developed to offer guidance for the SAAM based on similar regions. Finally, to obtain the robust tracking results, a weighted prediction method is considered to combine the HS and RGB predicted motions of objects to leverage the strengths of each modality. Extensive experiments on the HOTC-2020, HOTC-2024, and BihoT datasets demonstrate the effectiveness of the proposed SSF-Net compared with state-of-the-art trackers. The source code will be available at <uri>https://github.com/hzwyhc/hsvt</uri>\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"3518-3532\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11018216/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11018216/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SSF-Net: Spatial-Spectral Fusion Network With Spectral Angle Awareness for Hyperspectral Object Tracking
Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously, making it highly suitable for handling challenges such as background clutter and visual similarity in object tracking. However, existing methods primarily focus on band regrouping and rely on RGB trackers for feature extraction, resulting in limited exploration of spectral information and difficulties in achieving complementary representations of object features. In this paper, a spatial-spectral fusion network with spectral angle awareness (SSF-Net) is proposed for hyperspectral (HS) object tracking. Firstly, to address the issue of insufficient spectral feature extraction in existing networks, a spatial-spectral feature backbone ($S^{2}$ FB) is designed. With the spatial and spectral extraction branch, a joint representation of texture and spectrum is obtained. Secondly, a spectral attention fusion module (SAFM) is presented to capture the intra- and inter-modality correlation to obtain the fused features from the HS and RGB modalities. It can incorporate the visual information into the HS context to form a robust representation. Thirdly, to ensure a more accurate response to the object position, a spectral angle awareness module (SAAM) is designed to investigate the region-level spectral similarity between the template and search images during the prediction stage. Furthermore, a novel spectral angle awareness loss (SAAL) is developed to offer guidance for the SAAM based on similar regions. Finally, to obtain the robust tracking results, a weighted prediction method is considered to combine the HS and RGB predicted motions of objects to leverage the strengths of each modality. Extensive experiments on the HOTC-2020, HOTC-2024, and BihoT datasets demonstrate the effectiveness of the proposed SSF-Net compared with state-of-the-art trackers. The source code will be available at https://github.com/hzwyhc/hsvt