{"title":"BihoT: A Large-Scale Dataset and Benchmark for Hyperspectral Camouflaged Object Tracking","authors":"Hanzheng Wang;Wei Li;Xiang-Gen Xia;Qian Du","doi":"10.1109/TNNLS.2025.3564059","DOIUrl":null,"url":null,"abstract":"Hyperspectral object tracking (HOT) has many important applications, particularly in scenes where objects are camouflaged. The existing trackers can effectively retrieve objects via band regrouping because of the bias in the existing HOT datasets, where most objects tend to have distinguishing visual appearances rather than spectral characteristics. This bias allows a tracker to directly use the visual features obtained from the false-color images generated by hyperspectral images (HSIs) without extracting spectral features. To tackle this bias, the tracker should focus on the spectral information when object appearance is unreliable. Thus, we provide a new task called hyperspectral camouflaged object tracking (HCOT) and meticulously construct a large-scale HCOT dataset, BihoT, consisting of 41912 HSIs covering 49 video sequences. The dataset covers various artificial camouflage scenes, where objects have similar appearances, diverse spectrums, and frequent occlusion (OCC), making it a challenging dataset for HCOT. Besides, a simple but effective baseline model, named spectral prompt-based distractor-aware network (SPDAN), is proposed, comprising a spectral embedding network (SEN), a spectral prompt-based backbone network (SPBN), and a distractor-aware module (DAM). Specifically, the SEN extracts spectral-spatial features via 3-D and 2-D convolutions to form a refined prompt representation. Then, the SPBN fine-tunes powerful RGB trackers with spectral prompts and alleviates the insufficiency of training samples. Moreover, the DAM utilizes a novel statistic to capture the distractor caused by occlusion from objects and background and corrects the deterioration of the tracking performance via a novel motion predictor. Extensive experiments demonstrate that our proposed SPDAN achieves the state-of-the-art performance on the proposed BihoT and other HOT datasets.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"36 9","pages":"16392-16406"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10988886/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Hyperspectral object tracking (HOT) has many important applications, particularly in scenes where objects are camouflaged. The existing trackers can effectively retrieve objects via band regrouping because of the bias in the existing HOT datasets, where most objects tend to have distinguishing visual appearances rather than spectral characteristics. This bias allows a tracker to directly use the visual features obtained from the false-color images generated by hyperspectral images (HSIs) without extracting spectral features. To tackle this bias, the tracker should focus on the spectral information when object appearance is unreliable. Thus, we provide a new task called hyperspectral camouflaged object tracking (HCOT) and meticulously construct a large-scale HCOT dataset, BihoT, consisting of 41912 HSIs covering 49 video sequences. The dataset covers various artificial camouflage scenes, where objects have similar appearances, diverse spectrums, and frequent occlusion (OCC), making it a challenging dataset for HCOT. Besides, a simple but effective baseline model, named spectral prompt-based distractor-aware network (SPDAN), is proposed, comprising a spectral embedding network (SEN), a spectral prompt-based backbone network (SPBN), and a distractor-aware module (DAM). Specifically, the SEN extracts spectral-spatial features via 3-D and 2-D convolutions to form a refined prompt representation. Then, the SPBN fine-tunes powerful RGB trackers with spectral prompts and alleviates the insufficiency of training samples. Moreover, the DAM utilizes a novel statistic to capture the distractor caused by occlusion from objects and background and corrects the deterioration of the tracking performance via a novel motion predictor. Extensive experiments demonstrate that our proposed SPDAN achieves the state-of-the-art performance on the proposed BihoT and other HOT datasets.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.