用于遥感图像分类的鲁棒视觉变换器中的定量正则化

Huaxiang Song, Yuxuan Yuan, Zhiwei Ouyang, Yu Yang, Hui Xiang
{"title":"用于遥感图像分类的鲁棒视觉变换器中的定量正则化","authors":"Huaxiang Song, Yuxuan Yuan, Zhiwei Ouyang, Yu Yang, Hui Xiang","doi":"10.1111/phor.12489","DOIUrl":null,"url":null,"abstract":"Vision Transformers (ViTs) are exceptional at vision tasks. However, when applied to remote sensing images (RSIs), existing methods often necessitate extensive modifications of ViTs to rival convolutional neural networks (CNNs). This requirement significantly impedes the application of ViTs in geosciences, particularly for researchers who lack the time for comprehensive model redesign. To address this issue, we introduce the concept of quantitative regularization (QR), designed to enhance the performance of ViTs in RSI classification. QR represents an effective algorithm that adeptly manages domain discrepancies in RSIs and can be integrated with any ViTs in transfer learning. We evaluated the effectiveness of QR using three ViT architectures: vanilla ViT, Swin‐ViT and Next‐ViT, on four datasets: AID30, NWPU45, AFGR50 and UCM21. The results reveal that our Next‐ViT model surpasses 39 other advanced methods published in the past 3 years, maintaining robust performance even with a limited number of training samples. We also discovered that our ViT and Swin‐ViT achieve significantly higher accuracy and robustness compared to other methods using the same backbone. Our findings confirm that ViTs can be as effective as CNNs for RSI classification, regardless of the dataset size. Our approach exclusively employs open‐source ViTs and easily accessible training strategies. Consequently, we believe that our method can significantly lower the barriers for geoscience researchers intending to use ViT for RSI applications.","PeriodicalId":22881,"journal":{"name":"The Photogrammetric Record","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Quantitative regularization in robust vision transformer for remote sensing image classification\",\"authors\":\"Huaxiang Song, Yuxuan Yuan, Zhiwei Ouyang, Yu Yang, Hui Xiang\",\"doi\":\"10.1111/phor.12489\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vision Transformers (ViTs) are exceptional at vision tasks. However, when applied to remote sensing images (RSIs), existing methods often necessitate extensive modifications of ViTs to rival convolutional neural networks (CNNs). This requirement significantly impedes the application of ViTs in geosciences, particularly for researchers who lack the time for comprehensive model redesign. To address this issue, we introduce the concept of quantitative regularization (QR), designed to enhance the performance of ViTs in RSI classification. QR represents an effective algorithm that adeptly manages domain discrepancies in RSIs and can be integrated with any ViTs in transfer learning. We evaluated the effectiveness of QR using three ViT architectures: vanilla ViT, Swin‐ViT and Next‐ViT, on four datasets: AID30, NWPU45, AFGR50 and UCM21. The results reveal that our Next‐ViT model surpasses 39 other advanced methods published in the past 3 years, maintaining robust performance even with a limited number of training samples. We also discovered that our ViT and Swin‐ViT achieve significantly higher accuracy and robustness compared to other methods using the same backbone. Our findings confirm that ViTs can be as effective as CNNs for RSI classification, regardless of the dataset size. Our approach exclusively employs open‐source ViTs and easily accessible training strategies. Consequently, we believe that our method can significantly lower the barriers for geoscience researchers intending to use ViT for RSI applications.\",\"PeriodicalId\":22881,\"journal\":{\"name\":\"The Photogrammetric Record\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Photogrammetric Record\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1111/phor.12489\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Photogrammetric Record","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1111/phor.12489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

视觉变换器(ViTs)在视觉任务中表现出色。然而,在应用于遥感图像(RSI)时,现有方法往往需要对 ViT 进行大量修改,才能与卷积神经网络(CNN)相媲美。这一要求极大地阻碍了 ViTs 在地球科学领域的应用,尤其是对那些没有时间进行全面模型重新设计的研究人员而言。为了解决这个问题,我们引入了定量正则化(QR)的概念,旨在提高 ViTs 在 RSI 分类中的性能。量化正则化是一种有效的算法,它能巧妙地处理 RSI 中的领域差异,并能在迁移学习中与任何 ViTs 相结合。我们使用三种 ViT 架构:vanilla ViT、Swin-ViT 和 Next-ViT,在四个数据集上评估了 QR 的有效性:这四个数据集是:AID30、NWPU45、AFGR50 和 UCM21。结果表明,我们的 Next-ViT 模型超越了过去 3 年中发布的 39 种其他先进方法,即使在训练样本数量有限的情况下也能保持强劲的性能。我们还发现,与使用相同骨干网的其他方法相比,我们的 ViT 和 Swin-ViT 在准确性和稳健性方面都有显著提高。我们的研究结果证实,在 RSI 分类方面,无论数据集大小如何,ViT 都能像 CNN 一样有效。我们的方法完全采用开源 ViT 和易于获取的训练策略。因此,我们相信我们的方法可以大大降低地球科学研究人员将 ViT 用于 RSI 应用的门槛。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Quantitative regularization in robust vision transformer for remote sensing image classification
Vision Transformers (ViTs) are exceptional at vision tasks. However, when applied to remote sensing images (RSIs), existing methods often necessitate extensive modifications of ViTs to rival convolutional neural networks (CNNs). This requirement significantly impedes the application of ViTs in geosciences, particularly for researchers who lack the time for comprehensive model redesign. To address this issue, we introduce the concept of quantitative regularization (QR), designed to enhance the performance of ViTs in RSI classification. QR represents an effective algorithm that adeptly manages domain discrepancies in RSIs and can be integrated with any ViTs in transfer learning. We evaluated the effectiveness of QR using three ViT architectures: vanilla ViT, Swin‐ViT and Next‐ViT, on four datasets: AID30, NWPU45, AFGR50 and UCM21. The results reveal that our Next‐ViT model surpasses 39 other advanced methods published in the past 3 years, maintaining robust performance even with a limited number of training samples. We also discovered that our ViT and Swin‐ViT achieve significantly higher accuracy and robustness compared to other methods using the same backbone. Our findings confirm that ViTs can be as effective as CNNs for RSI classification, regardless of the dataset size. Our approach exclusively employs open‐source ViTs and easily accessible training strategies. Consequently, we believe that our method can significantly lower the barriers for geoscience researchers intending to use ViT for RSI applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信