基于知识蒸馏的遥感图像无参数关注模型场景分类

2022 18th International Conference on Mobility, Sensing and Networking (MSN) Pub Date : 2022-12-01 DOI:10.1109/MSN57253.2022.00077

Yubing Han, Zongyin Liu, Jiguo Yu, Anming Dong, Huihui Zhang

{"title":"基于知识蒸馏的遥感图像无参数关注模型场景分类","authors":"Yubing Han, Zongyin Liu, Jiguo Yu, Anming Dong, Huihui Zhang","doi":"10.1109/MSN57253.2022.00077","DOIUrl":null,"url":null,"abstract":"Remote sensing image scene classification is to label remote sensing images as a specific scene category by understanding the semantic information of the images. It is an essential link in remote sensing image analysis and interpretation and has important research value. Convolutional neural networks (CNNs) have been dominant in remote sensing image scene classification due to their powerful feature extraction capabilities. The general trend has been to make deeper and wider CNN architectures to achieve higher classification accuracy. However, these advances to improve accuracy enlarge the network, creating too many parameters and high computational costs. Large models are difficult to deploy on resource-constrained edge devices for practical applications. Furthermore, CNNs can effectively capture local information but are weak in extracting global features. To overcome these drawbacks, we propose a novel knowledge distillation (KD) based method by employing Swin Transformer as a teacher network for guiding MobileNetV2 with Parameter-Free Attention (MobileNetV2-PFA). First, we modify MobileNetV2 by introducing PFA into the inverted bottleneck block; this improvement helps the model learn more latent and robust features without extra parameters. Second, Swin Transformer is an excellent architecture for capturing long-range dependencies via shifted window-based attention. So, we utilize the long-range dependency information from the Swin Transformer to assist MobileNetV2-PFA training through KD. Experimental results on the challenging NWPU-RESISC45 dataset show that the proposed method outperforms the original MobileNetV2 in classification accuracy with low computational consumption.","PeriodicalId":114459,"journal":{"name":"2022 18th International Conference on Mobility, Sensing and Networking (MSN)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scene Classification Through Knowledge Distillation Enabled Parameter-Free Attention Model for Remote Sensing Images\",\"authors\":\"Yubing Han, Zongyin Liu, Jiguo Yu, Anming Dong, Huihui Zhang\",\"doi\":\"10.1109/MSN57253.2022.00077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Remote sensing image scene classification is to label remote sensing images as a specific scene category by understanding the semantic information of the images. It is an essential link in remote sensing image analysis and interpretation and has important research value. Convolutional neural networks (CNNs) have been dominant in remote sensing image scene classification due to their powerful feature extraction capabilities. The general trend has been to make deeper and wider CNN architectures to achieve higher classification accuracy. However, these advances to improve accuracy enlarge the network, creating too many parameters and high computational costs. Large models are difficult to deploy on resource-constrained edge devices for practical applications. Furthermore, CNNs can effectively capture local information but are weak in extracting global features. To overcome these drawbacks, we propose a novel knowledge distillation (KD) based method by employing Swin Transformer as a teacher network for guiding MobileNetV2 with Parameter-Free Attention (MobileNetV2-PFA). First, we modify MobileNetV2 by introducing PFA into the inverted bottleneck block; this improvement helps the model learn more latent and robust features without extra parameters. Second, Swin Transformer is an excellent architecture for capturing long-range dependencies via shifted window-based attention. So, we utilize the long-range dependency information from the Swin Transformer to assist MobileNetV2-PFA training through KD. Experimental results on the challenging NWPU-RESISC45 dataset show that the proposed method outperforms the original MobileNetV2 in classification accuracy with low computational consumption.\",\"PeriodicalId\":114459,\"journal\":{\"name\":\"2022 18th International Conference on Mobility, Sensing and Networking (MSN)\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 18th International Conference on Mobility, Sensing and Networking (MSN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MSN57253.2022.00077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 18th International Conference on Mobility, Sensing and Networking (MSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSN57253.2022.00077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

遥感图像场景分类是通过了解遥感图像的语义信息，将遥感图像标记为特定的场景类别。它是遥感影像分析与解译的重要环节，具有重要的研究价值。卷积神经网络以其强大的特征提取能力在遥感图像场景分类中占据主导地位。为了达到更高的分类精度，总趋势是做更深更广的CNN架构。然而，这些提高精度的进步扩大了网络，产生了太多的参数和高昂的计算成本。在实际应用中，大型模型很难部署在资源受限的边缘设备上。此外，cnn可以有效地捕获局部信息，但在提取全局特征方面较弱。为了克服这些缺点，我们提出了一种新的基于知识蒸馏(KD)的方法，该方法利用Swin Transformer作为教师网络来指导具有无参数注意力的MobileNetV2 (MobileNetV2- pfa)。首先，我们通过在倒置瓶颈块中引入PFA来修改MobileNetV2;这种改进有助于模型在没有额外参数的情况下学习更多潜在的和鲁棒的特征。其次，Swin Transformer是一个通过基于窗口的注意力转移来捕获远程依赖的优秀架构。因此，我们利用Swin Transformer的远程依赖信息，通过KD辅助MobileNetV2-PFA培训。在具有挑战性的NWPU-RESISC45数据集上的实验结果表明，该方法在分类精度上优于原始的MobileNetV2，且计算量低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scene Classification Through Knowledge Distillation Enabled Parameter-Free Attention Model for Remote Sensing Images

Remote sensing image scene classification is to label remote sensing images as a specific scene category by understanding the semantic information of the images. It is an essential link in remote sensing image analysis and interpretation and has important research value. Convolutional neural networks (CNNs) have been dominant in remote sensing image scene classification due to their powerful feature extraction capabilities. The general trend has been to make deeper and wider CNN architectures to achieve higher classification accuracy. However, these advances to improve accuracy enlarge the network, creating too many parameters and high computational costs. Large models are difficult to deploy on resource-constrained edge devices for practical applications. Furthermore, CNNs can effectively capture local information but are weak in extracting global features. To overcome these drawbacks, we propose a novel knowledge distillation (KD) based method by employing Swin Transformer as a teacher network for guiding MobileNetV2 with Parameter-Free Attention (MobileNetV2-PFA). First, we modify MobileNetV2 by introducing PFA into the inverted bottleneck block; this improvement helps the model learn more latent and robust features without extra parameters. Second, Swin Transformer is an excellent architecture for capturing long-range dependencies via shifted window-based attention. So, we utilize the long-range dependency information from the Swin Transformer to assist MobileNetV2-PFA training through KD. Experimental results on the challenging NWPU-RESISC45 dataset show that the proposed method outperforms the original MobileNetV2 in classification accuracy with low computational consumption.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 18th International Conference on Mobility, Sensing and Networking (MSN)

自引率

0.00%

发文量