Wenfang Sun;Yuedong Tan;Jingyuan Li;Shuwei Hou;Xiaobo Li;Yingzhao Shao;Zhe Wang;Beibei Song
{"title":"HotMoE: Exploring Sparse Mixture-of-Experts for Hyperspectral Object Tracking","authors":"Wenfang Sun;Yuedong Tan;Jingyuan Li;Shuwei Hou;Xiaobo Li;Yingzhao Shao;Zhe Wang;Beibei Song","doi":"10.1109/TMM.2025.3535339","DOIUrl":null,"url":null,"abstract":"Hyperspectral videos contain richer spectral and physical features than RGB videos and thus have greater potential for use in object tracking. The mainstream hyperspectral object tracking approach involves the integration of multiple RGB-based video tracking models. Although ensembles of multiple models can effectively utilize spectral information and improve tracker performance, this approach has high computational complexity, making it difficult to meet the real-time requirements of video object tracking. To bridge the gap, we propose a new hyperspectral object tracking framework (HotMoE) based on Mixture-of-Experts (MoE). HotMoE leverages a divide-and-conquer strategy, where only a subset of expert models is computed for each input, reducing computational complexity while maintaining performance. In this paper, we first design a splitter to group multiple spectral bands into multiple false-color images based on spectral correlations. Then, we design a hyperspectral MoE router that can adaptively learn to aggregate spectral image feature information and route it to suitable experts. Different experts can handle various scenarios, and HotMoE effectively utilizes the capabilities of different experts to obtain better overall performance. Compared with previous state-of-the-art hyperspectral object tracking networks, our model has significantly reduced inference time and performs well, with a processing speed of 43.7 FPS and an AUC of 0.704 with the HOT2022 dataset.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"4072-4083"},"PeriodicalIF":9.7000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10855488/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Hyperspectral videos contain richer spectral and physical features than RGB videos and thus have greater potential for use in object tracking. The mainstream hyperspectral object tracking approach involves the integration of multiple RGB-based video tracking models. Although ensembles of multiple models can effectively utilize spectral information and improve tracker performance, this approach has high computational complexity, making it difficult to meet the real-time requirements of video object tracking. To bridge the gap, we propose a new hyperspectral object tracking framework (HotMoE) based on Mixture-of-Experts (MoE). HotMoE leverages a divide-and-conquer strategy, where only a subset of expert models is computed for each input, reducing computational complexity while maintaining performance. In this paper, we first design a splitter to group multiple spectral bands into multiple false-color images based on spectral correlations. Then, we design a hyperspectral MoE router that can adaptively learn to aggregate spectral image feature information and route it to suitable experts. Different experts can handle various scenarios, and HotMoE effectively utilizes the capabilities of different experts to obtain better overall performance. Compared with previous state-of-the-art hyperspectral object tracking networks, our model has significantly reduced inference time and performs well, with a processing speed of 43.7 FPS and an AUC of 0.704 with the HOT2022 dataset.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.