MEET: A Million-Scale Dataset for Fine-Grained Geospatial Scene Classification With Zoom-Free Remote Sensing Imagery

IF 19.2 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Ieee-Caa Journal of Automatica Sinica Pub Date : 2025-03-15 DOI:10.1109/JAS.2025.125324

Yansheng Li;Yuning Wu;Gong Cheng;Chao Tao;Bo Dang;Yu Wang;Jiahao Zhang;Chuge Zhang;Yiting Liu;Xu Tang;Jiayi Ma;Yongjun Zhang

{"title":"MEET: A Million-Scale Dataset for Fine-Grained Geospatial Scene Classification With Zoom-Free Remote Sensing Imagery","authors":"Yansheng Li;Yuning Wu;Gong Cheng;Chao Tao;Bo Dang;Yu Wang;Jiahao Zhang;Chuge Zhang;Yiting Liu;Xu Tang;Jiayi Ma;Yongjun Zhang","doi":"10.1109/JAS.2025.125324","DOIUrl":null,"url":null,"abstract":"Accurate fine-grained geospatial scene classification using remote sensing imagery is essential for a wide range of applications. However, existing approaches often rely on manually zooming remote sensing images at different scales to create typical scene samples. This approach fails to adequately support the fixed-resolution image interpretation requirements in real-world scenarios. To address this limitation, we introduce the million-scale fine-grained geospatial scene classification dataset (MEET), which contains over 1.03 million zoom-free remote sensing scene samples, manually annotated into 80 fine-grained categories. In MEET, each scene sample follows a scene-in-scene layout, where the central scene serves as the reference, and auxiliary scenes provide crucial spatial context for fine-grained classification. Moreover, to tackle the emerging challenge of scene-in-scene classification, we present the context-aware transformer (CAT), a model specifically designed for this task, which adaptively fuses spatial context to accurately classify the scene samples. CAT adaptively fuses spatial context to accurately classify the scene samples by learning attentional features that capture the relationships between the center and auxiliary scenes. Based on MEET, we establish a comprehensive benchmark for fine-grained geospatial scene classification, evaluating CAT against 11 competitive baselines. The results demonstrate that CAT significantly outperforms these baselines, achieving a 1.88% higher balanced accuracy (BA) with the Swin-Large backbone, and a notable 7.87% improvement with the Swin-Huge backbone. Further experiments validate the effectiveness of each module in CAT and show the practical applicability of CAT in the urban functional zone mapping. The source code and dataset will be publicly available at https://jerrywyn.github.io/project/MEET.html.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 5","pages":"1004-1023"},"PeriodicalIF":19.2000,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11005744/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate fine-grained geospatial scene classification using remote sensing imagery is essential for a wide range of applications. However, existing approaches often rely on manually zooming remote sensing images at different scales to create typical scene samples. This approach fails to adequately support the fixed-resolution image interpretation requirements in real-world scenarios. To address this limitation, we introduce the million-scale fine-grained geospatial scene classification dataset (MEET), which contains over 1.03 million zoom-free remote sensing scene samples, manually annotated into 80 fine-grained categories. In MEET, each scene sample follows a scene-in-scene layout, where the central scene serves as the reference, and auxiliary scenes provide crucial spatial context for fine-grained classification. Moreover, to tackle the emerging challenge of scene-in-scene classification, we present the context-aware transformer (CAT), a model specifically designed for this task, which adaptively fuses spatial context to accurately classify the scene samples. CAT adaptively fuses spatial context to accurately classify the scene samples by learning attentional features that capture the relationships between the center and auxiliary scenes. Based on MEET, we establish a comprehensive benchmark for fine-grained geospatial scene classification, evaluating CAT against 11 competitive baselines. The results demonstrate that CAT significantly outperforms these baselines, achieving a 1.88% higher balanced accuracy (BA) with the Swin-Large backbone, and a notable 7.87% improvement with the Swin-Huge backbone. Further experiments validate the effectiveness of each module in CAT and show the practical applicability of CAT in the urban functional zone mapping. The source code and dataset will be publicly available at https://jerrywyn.github.io/project/MEET.html.

查看原文本刊更多论文

基于无变焦遥感图像的百万尺度细粒度地理空间场景分类数据集

利用遥感图像进行精确的细粒度地理空间场景分类对于广泛的应用至关重要。然而，现有的方法往往依赖于手动缩放不同尺度的遥感图像来创建典型的场景样本。这种方法不能充分支持实际场景中的固定分辨率图像解释需求。为了解决这一限制，我们引入了百万尺度细粒度地理空间场景分类数据集（MEET），该数据集包含超过103万个无变焦遥感场景样本，手动标注为80个细粒度类别。在MEET中，每个场景样本都遵循场景中场景的布局，其中中心场景作为参考，辅助场景为细粒度分类提供关键的空间背景。此外，为了解决场景中场景分类的新挑战，我们提出了一个专门为该任务设计的上下文感知变压器（CAT）模型，该模型自适应融合空间上下文以准确分类场景样本。CAT自适应融合空间语境，通过学习捕捉中心和辅助场景之间关系的注意特征，对场景样本进行准确分类。基于MEET，我们建立了一个细粒度地理空间场景分类的综合基准，并对11个竞争基线进行了CAT评估。结果表明，CAT显著优于这些基线，使用swing - large主干的平衡精度（BA）提高了1.88%，使用swing - huge主干的平衡精度（BA）提高了7.87%。进一步的实验验证了CAT中各个模块的有效性，展示了CAT在城市功能区制图中的实际适用性。源代码和数据集将在https://jerrywyn.github.io/project/MEET.html上公开提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ieee-Caa Journal of Automatica Sinica Engineering-Control and Systems Engineering

CiteScore

23.50

自引率

11.00%

发文量

880

期刊介绍： The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control. Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.