HotMoE: Exploring Sparse Mixture-of-Experts for Hyperspectral Object Tracking

IF 9.7 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Multimedia Pub Date : 2025-01-27 DOI:10.1109/TMM.2025.3535339

Wenfang Sun;Yuedong Tan;Jingyuan Li;Shuwei Hou;Xiaobo Li;Yingzhao Shao;Zhe Wang;Beibei Song

{"title":"HotMoE: Exploring Sparse Mixture-of-Experts for Hyperspectral Object Tracking","authors":"Wenfang Sun;Yuedong Tan;Jingyuan Li;Shuwei Hou;Xiaobo Li;Yingzhao Shao;Zhe Wang;Beibei Song","doi":"10.1109/TMM.2025.3535339","DOIUrl":null,"url":null,"abstract":"Hyperspectral videos contain richer spectral and physical features than RGB videos and thus have greater potential for use in object tracking. The mainstream hyperspectral object tracking approach involves the integration of multiple RGB-based video tracking models. Although ensembles of multiple models can effectively utilize spectral information and improve tracker performance, this approach has high computational complexity, making it difficult to meet the real-time requirements of video object tracking. To bridge the gap, we propose a new hyperspectral object tracking framework (HotMoE) based on Mixture-of-Experts (MoE). HotMoE leverages a divide-and-conquer strategy, where only a subset of expert models is computed for each input, reducing computational complexity while maintaining performance. In this paper, we first design a splitter to group multiple spectral bands into multiple false-color images based on spectral correlations. Then, we design a hyperspectral MoE router that can adaptively learn to aggregate spectral image feature information and route it to suitable experts. Different experts can handle various scenarios, and HotMoE effectively utilizes the capabilities of different experts to obtain better overall performance. Compared with previous state-of-the-art hyperspectral object tracking networks, our model has significantly reduced inference time and performs well, with a processing speed of 43.7 FPS and an AUC of 0.704 with the HOT2022 dataset.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"4072-4083"},"PeriodicalIF":9.7000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10855488/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Hyperspectral videos contain richer spectral and physical features than RGB videos and thus have greater potential for use in object tracking. The mainstream hyperspectral object tracking approach involves the integration of multiple RGB-based video tracking models. Although ensembles of multiple models can effectively utilize spectral information and improve tracker performance, this approach has high computational complexity, making it difficult to meet the real-time requirements of video object tracking. To bridge the gap, we propose a new hyperspectral object tracking framework (HotMoE) based on Mixture-of-Experts (MoE). HotMoE leverages a divide-and-conquer strategy, where only a subset of expert models is computed for each input, reducing computational complexity while maintaining performance. In this paper, we first design a splitter to group multiple spectral bands into multiple false-color images based on spectral correlations. Then, we design a hyperspectral MoE router that can adaptively learn to aggregate spectral image feature information and route it to suitable experts. Different experts can handle various scenarios, and HotMoE effectively utilizes the capabilities of different experts to obtain better overall performance. Compared with previous state-of-the-art hyperspectral object tracking networks, our model has significantly reduced inference time and performs well, with a processing speed of 43.7 FPS and an AUC of 0.704 with the HOT2022 dataset.

查看原文本刊更多论文

HotMoE：探索高光谱目标跟踪的稀疏混合专家

高光谱视频比RGB视频包含更丰富的光谱和物理特征，因此在目标跟踪方面具有更大的潜力。主流的高光谱目标跟踪方法涉及多个基于rgb的视频跟踪模型的集成。虽然多模型集成可以有效利用光谱信息，提高跟踪性能，但该方法计算复杂度高，难以满足视频目标跟踪的实时性要求。为了弥补这一缺陷，我们提出了一种基于混合专家（MoE）的高光谱目标跟踪框架（HotMoE）。HotMoE利用了分而治之的策略，其中每个输入只计算专家模型的子集，在保持性能的同时降低了计算复杂性。在本文中，我们首先设计了一个分光器，将多个光谱波段根据光谱相关性分组成多个假彩色图像。然后，我们设计了一个能够自适应学习聚合光谱图像特征信息并路由给合适专家的高光谱MoE路由器。不同的专家可以处理不同的场景，HotMoE有效地利用了不同专家的能力，以获得更好的整体性能。与之前最先进的高光谱目标跟踪网络相比，我们的模型显著缩短了推理时间，并且表现良好，在HOT2022数据集上的处理速度为43.7 FPS， AUC为0.704。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Multimedia 工程技术-电信学

CiteScore

11.70

自引率

11.00%

发文量

576

审稿时长

5.5 months

期刊介绍： The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.