Zhiyuan Cheng, Andong Lu, Zhang Zhang, Chenglong Li, Liang Wang
{"title":"Fusion Tree Network for RGBT Tracking","authors":"Zhiyuan Cheng, Andong Lu, Zhang Zhang, Chenglong Li, Liang Wang","doi":"10.1109/AVSS56176.2022.9959406","DOIUrl":null,"url":null,"abstract":"RGBT tracking is often affected by complex scenes (i.e., occlusions, scale changes, noisy background, etc). Existing works usually adopt a single-strategy RGBT tracking fusion scheme to handle modality fusion in all scenarios. However, due to the limitation of fusion model capacity, it is difficult to fully integrate the discriminative features between different modalities. To tackle this problem, we propose a Fusion Tree Network (FTNet), which provides a multi-strategy fusion model with high capacity to efficiently fuse different modalities. Specifically, we combine three kinds of attention modules (i.e., channel attention, spatial attention, and location attention) in a tree structure to achieve multi-path hybrid attention in the deeper convolutional stages of the object tracking network. Extensive experiments are performed on three RGBT tracking datasets, and the results show that our method achieves superior performance among state-of-the-art RGBT tracking models.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS56176.2022.9959406","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
RGBT tracking is often affected by complex scenes (i.e., occlusions, scale changes, noisy background, etc). Existing works usually adopt a single-strategy RGBT tracking fusion scheme to handle modality fusion in all scenarios. However, due to the limitation of fusion model capacity, it is difficult to fully integrate the discriminative features between different modalities. To tackle this problem, we propose a Fusion Tree Network (FTNet), which provides a multi-strategy fusion model with high capacity to efficiently fuse different modalities. Specifically, we combine three kinds of attention modules (i.e., channel attention, spatial attention, and location attention) in a tree structure to achieve multi-path hybrid attention in the deeper convolutional stages of the object tracking network. Extensive experiments are performed on three RGBT tracking datasets, and the results show that our method achieves superior performance among state-of-the-art RGBT tracking models.