Vision Mamba Distillation for Low-Resolution Fine-Grained Image Classification

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters Pub Date : 2025-04-22 DOI:10.1109/LSP.2025.3563441

Yao Chen;Jiabao Wang;Peichao Wang;Rui Zhang;Yang Li

{"title":"Vision Mamba Distillation for Low-Resolution Fine-Grained Image Classification","authors":"Yao Chen;Jiabao Wang;Peichao Wang;Rui Zhang;Yang Li","doi":"10.1109/LSP.2025.3563441","DOIUrl":null,"url":null,"abstract":"Low-resolution fine-grained image classification has recently made significant progress, largely thanks to the super-resolution techniques and knowledge distillation methods. However, these approaches lead to an exponential increase in the number of parameters and computational complexity of models. In order to solve this problem, in this letter, we propose a Vision Mamba Distillation (ViMD) approach to enhance the effectiveness and efficiency of low-resolution fine-grained image classification. Concretely, a lightweight super-resolution vision Mamba classification network (SRVM-Net) is proposed to improve its capability for extracting visual features by redesigning the classification sub-network with Mamba modeling. Moreover, we design a novel multi-level Mamba knowledge distillation loss to boost the performance. The loss can transfer prior knowledge obtained from a High-resolution Vision Mamba classification Network (HRVM-Net) as a teacher into the proposed SRVM-Net as a student. Extensive experiments on seven public fine-grained classification datasets related to benchmarks confirm our ViMD achieves a new state-of-the-art performance. While having higher accuracy, ViMD outperforms similar methods with fewer parameters and FLOPs, which is more suitable for embedded device applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1965-1969"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10974477/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Low-resolution fine-grained image classification has recently made significant progress, largely thanks to the super-resolution techniques and knowledge distillation methods. However, these approaches lead to an exponential increase in the number of parameters and computational complexity of models. In order to solve this problem, in this letter, we propose a Vision Mamba Distillation (ViMD) approach to enhance the effectiveness and efficiency of low-resolution fine-grained image classification. Concretely, a lightweight super-resolution vision Mamba classification network (SRVM-Net) is proposed to improve its capability for extracting visual features by redesigning the classification sub-network with Mamba modeling. Moreover, we design a novel multi-level Mamba knowledge distillation loss to boost the performance. The loss can transfer prior knowledge obtained from a High-resolution Vision Mamba classification Network (HRVM-Net) as a teacher into the proposed SRVM-Net as a student. Extensive experiments on seven public fine-grained classification datasets related to benchmarks confirm our ViMD achieves a new state-of-the-art performance. While having higher accuracy, ViMD outperforms similar methods with fewer parameters and FLOPs, which is more suitable for embedded device applications.

查看原文本刊更多论文

用于低分辨率细粒度图像分类的视觉曼巴蒸馏

低分辨率细粒度图像分类近年来取得了重大进展，这在很大程度上要归功于超分辨率技术和知识蒸馏方法。然而，这些方法导致模型的参数数量和计算复杂度呈指数增长。为了解决这一问题，在这封信中，我们提出了一种视觉曼巴蒸馏（Vision Mamba Distillation, ViMD）方法来提高低分辨率细粒度图像分类的有效性和效率。具体而言，提出了一种轻量级的超分辨率视觉曼巴分类网络（SRVM-Net），利用曼巴模型对分类子网络进行重新设计，以提高其视觉特征提取能力。此外，我们还设计了一种新的多层曼巴知识蒸馏损失算法来提高性能。这种损失可以将作为教师从高分辨率视觉曼巴分类网络（HRVM-Net）中获得的先验知识转移到作为学生的SRVM-Net中。在与基准测试相关的七个公共细粒度分类数据集上进行的大量实验证实，我们的ViMD实现了新的最先进的性能。在具有更高精度的同时，ViMD以更少的参数和FLOPs优于类似的方法，更适合嵌入式设备应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Signal Processing Letters 工程技术-工程：电子与电气

CiteScore

7.40

自引率

12.80%

发文量

339

审稿时长

2.8 months

期刊介绍： The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.