ViCxLSTM：一种用于复杂遥感场景分类的扩展长短期记忆视觉转换器

IF 8.6 Q1 REMOTE SENSING

International journal of applied earth observation and geoinformation : ITC journal Pub Date : 2025-09-01 DOI:10.1016/j.jag.2025.104801

Swalpa Kumar Roy , Ali Jamali , Koushik Biswas , Danfeng Hong , Pedram Ghamisi

{"title":"ViCxLSTM：一种用于复杂遥感场景分类的扩展长短期记忆视觉转换器","authors":"Swalpa Kumar Roy , Ali Jamali , Koushik Biswas , Danfeng Hong , Pedram Ghamisi","doi":"10.1016/j.jag.2025.104801","DOIUrl":null,"url":null,"abstract":"<div><div>Scene classification plays a critical role in remote sensing image analysis, with numerous methods based on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) developed to improve performance on high-resolution remote sensing (HRRS) imagery. However, the existing models struggle with several key challenges, including effectively capturing fine-grained local features and modeling long-range spatial dependencies in complex scenes. These limitations reduce the discriminative power of extracted features, which is critical for HRRS image classification. To overcome these issues, our study aims to design a unified model that jointly leverages local information extraction, global context modeling, and long-range dependency learning. We propose a novel architecture, ViCxLSTM, designed to enhance feature discriminability for HRRS scene classification. ViCxLSTM is a hybrid model that integrates a Local Pattern Unit (comprising convolutional layers and Fourier Transforms), an extended Long Short-Term Memory module (xLSTM), and a Vision Transformer. This integrated architecture enables the model to capture a wide range of spatial patterns, from local textures to long-range dependencies and global contextual relationships. Experimental evaluations show that ViCxLSTM achieves superior classification performance across diverse land use datasets, outperforming several state-of-the-art models, including ResNet-50, ResNet-101, ResNet-152, ViT, LeViT, CrossViT, DeepViT, and CaiT. The code will be provided freely accessible at <span><span>https://github.com/aj1365/ViCxLSTM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"143 ","pages":"Article 104801"},"PeriodicalIF":8.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ViCxLSTM: An extended Long Short-term Memory vision transformer for complex remote sensing scene classification\",\"authors\":\"Swalpa Kumar Roy , Ali Jamali , Koushik Biswas , Danfeng Hong , Pedram Ghamisi\",\"doi\":\"10.1016/j.jag.2025.104801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Scene classification plays a critical role in remote sensing image analysis, with numerous methods based on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) developed to improve performance on high-resolution remote sensing (HRRS) imagery. However, the existing models struggle with several key challenges, including effectively capturing fine-grained local features and modeling long-range spatial dependencies in complex scenes. These limitations reduce the discriminative power of extracted features, which is critical for HRRS image classification. To overcome these issues, our study aims to design a unified model that jointly leverages local information extraction, global context modeling, and long-range dependency learning. We propose a novel architecture, ViCxLSTM, designed to enhance feature discriminability for HRRS scene classification. ViCxLSTM is a hybrid model that integrates a Local Pattern Unit (comprising convolutional layers and Fourier Transforms), an extended Long Short-Term Memory module (xLSTM), and a Vision Transformer. This integrated architecture enables the model to capture a wide range of spatial patterns, from local textures to long-range dependencies and global contextual relationships. Experimental evaluations show that ViCxLSTM achieves superior classification performance across diverse land use datasets, outperforming several state-of-the-art models, including ResNet-50, ResNet-101, ResNet-152, ViT, LeViT, CrossViT, DeepViT, and CaiT. The code will be provided freely accessible at <span><span>https://github.com/aj1365/ViCxLSTM</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":73423,\"journal\":{\"name\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"volume\":\"143 \",\"pages\":\"Article 104801\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569843225004480\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225004480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}

引用次数: 0

摘要

场景分类在遥感图像分析中起着至关重要的作用，为了提高高分辨率遥感图像的分类性能，基于卷积神经网络（cnn）和视觉变换（ViTs）的分类方法被开发出来。然而，现有的模型面临着几个关键的挑战，包括有效地捕获细粒度的局部特征和建模复杂场景中的远程空间依赖关系。这些限制降低了提取特征的判别能力，这对HRRS图像分类至关重要。为了克服这些问题，我们的研究旨在设计一个统一的模型，共同利用本地信息提取、全局上下文建模和远程依赖学习。我们提出了一种新的结构，ViCxLSTM，旨在提高HRRS场景分类的特征可辨别性。ViCxLSTM是一个混合模型，它集成了一个局部模式单元（包括卷积层和傅立叶变换）、一个扩展的长短期记忆模块（xLSTM）和一个视觉转换器。这种集成的体系结构使模型能够捕获范围广泛的空间模式，从局部纹理到远程依赖关系和全局上下文关系。实验评估表明，ViCxLSTM在不同的土地利用数据集上取得了优异的分类性能，优于几种最先进的模型，包括ResNet-50、ResNet-101、ResNet-152、ViT、LeViT、CrossViT、DeepViT和CaiT。代码将在https://github.com/aj1365/ViCxLSTM上免费提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ViCxLSTM: An extended Long Short-term Memory vision transformer for complex remote sensing scene classification

Scene classification plays a critical role in remote sensing image analysis, with numerous methods based on Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) developed to improve performance on high-resolution remote sensing (HRRS) imagery. However, the existing models struggle with several key challenges, including effectively capturing fine-grained local features and modeling long-range spatial dependencies in complex scenes. These limitations reduce the discriminative power of extracted features, which is critical for HRRS image classification. To overcome these issues, our study aims to design a unified model that jointly leverages local information extraction, global context modeling, and long-range dependency learning. We propose a novel architecture, ViCxLSTM, designed to enhance feature discriminability for HRRS scene classification. ViCxLSTM is a hybrid model that integrates a Local Pattern Unit (comprising convolutional layers and Fourier Transforms), an extended Long Short-Term Memory module (xLSTM), and a Vision Transformer. This integrated architecture enables the model to capture a wide range of spatial patterns, from local textures to long-range dependencies and global contextual relationships. Experimental evaluations show that ViCxLSTM achieves superior classification performance across diverse land use datasets, outperforming several state-of-the-art models, including ResNet-50, ResNet-101, ResNet-152, ViT, LeViT, CrossViT, DeepViT, and CaiT. The code will be provided freely accessible at https://github.com/aj1365/ViCxLSTM.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International journal of applied earth observation and geoinformation : ITC journal Global and Planetary Change, Management, Monitoring, Policy and Law, Earth-Surface Processes, Computers in Earth Sciences

CiteScore

12.00

自引率

0.00%

发文量

审稿时长

77 days

期刊介绍： The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.