Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers

IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing Pub Date : 2025-01-30 DOI:10.1109/OJSP.2025.3536853

Masahiro Kada;Ryota Yoshihashi;Satoshi Ikehata;Rei Kawakami;Ikuro Sato

{"title":"Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers","authors":"Masahiro Kada;Ryota Yoshihashi;Satoshi Ikehata;Rei Kawakami;Ikuro Sato","doi":"10.1109/OJSP.2025.3536853","DOIUrl":null,"url":null,"abstract":"Mixture of experts with a sparse expert selection rule has been gaining much attention recently because of its scalability without compromising inference time. However, unlike standard neural networks, sparse mixture-of-experts models inherently exhibit discontinuities in the output space, which may impede the acquisition of appropriate invariance to the input perturbations, leading to a deterioration of model performance for tasks such as classification. To address this issue, we propose Pairwise Router Consistency (PRC) that effectively penalizes the discontinuities occurring under natural deformations of input images. With the supervised loss, the use of PRC loss empirically improves classification accuracy on ImageNet-1 K, CIFAR-10, and CIFAR-100 datasets, compared to a baseline method. Notably, our method with 1-expert selection slightly outperforms the baseline method using 2-expert selection. We also confirmed that models trained with our method experience discontinuous changes less frequently under input perturbations.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"276-283"},"PeriodicalIF":2.7000,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10858379","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10858379/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Mixture of experts with a sparse expert selection rule has been gaining much attention recently because of its scalability without compromising inference time. However, unlike standard neural networks, sparse mixture-of-experts models inherently exhibit discontinuities in the output space, which may impede the acquisition of appropriate invariance to the input perturbations, leading to a deterioration of model performance for tasks such as classification. To address this issue, we propose Pairwise Router Consistency (PRC) that effectively penalizes the discontinuities occurring under natural deformations of input images. With the supervised loss, the use of PRC loss empirically improves classification accuracy on ImageNet-1 K, CIFAR-10, and CIFAR-100 datasets, compared to a baseline method. Notably, our method with 1-expert selection slightly outperforms the baseline method using 2-expert selection. We also confirmed that models trained with our method experience discontinuous changes less frequently under input perturbations.

查看原文本刊更多论文

稀疏混合专家视觉变压器对输入扰动的鲁棒增强

基于稀疏专家选择规则的混合专家算法由于其不影响推理时间的可扩展性而受到了广泛的关注。然而，与标准神经网络不同，稀疏混合专家模型在输出空间中固有地表现出不连续，这可能会阻碍对输入扰动的适当不变性的获取，从而导致模型在分类等任务中的性能下降。为了解决这个问题，我们提出了配对路由器一致性（Pairwise Router Consistency， PRC），它可以有效地惩罚输入图像自然变形下出现的不连续。使用监督损失，与基线方法相比，使用PRC损失经验地提高了imagenet - 1k、CIFAR-10和CIFAR-100数据集的分类精度。值得注意的是，我们使用1位专家选择的方法略微优于使用2位专家选择的基线方法。我们还证实，用我们的方法训练的模型在输入扰动下经历不连续变化的频率较低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊