Geo-scenes dissecting urban fabric: Understanding and recognition combining AI, remotely sensed data and multimodal spatial semantics

IF 12.2 1区地球科学 Q1 GEOGRAPHY, PHYSICAL

ISPRS Journal of Photogrammetry and Remote Sensing Pub Date : 2025-10-16 DOI:10.1016/j.isprsjprs.2025.10.011

Hanqing Bao , Lanyue Zhou , Lukas W. Lehnert

{"title":"Geo-scenes dissecting urban fabric: Understanding and recognition combining AI, remotely sensed data and multimodal spatial semantics","authors":"Hanqing Bao , Lanyue Zhou , Lukas W. Lehnert","doi":"10.1016/j.isprsjprs.2025.10.011","DOIUrl":null,"url":null,"abstract":"<div><div>Urban fabric represents the intersection of spatial structure and social function. Analyzing its geographic components, functional semantics, and interactive relationships enables a deeper understanding of the formation and evolution of urban geo-scenes. Urban geo-scenes (UGS), as the fundamental units of urban systems, play a vital role in balancing and optimizing spatial layout, while enhancing urban resilience and vitality. Although multimodal spatial data are widely used to describe UGS, conventional approaches that rely solely on visual or social features are insufficient when addressing the complexity of modern urban systems. The spatial relationships and distributional patterns among urban elements are equally crucial for capturing the full semantic structure of urban geo-scenes. In parallel, most deep learning models still face limitations in effectively mining and fusing such diverse information. To address these challenges, we propose a multimodal deep learning framework for UGS recognition. Guided by the concepts of urban fabric and spatial co-location patterns, our method dissects the internal structure of geo-scenes and constructs a bottom-up urban fabric graph model to capture spatial semantics among geographic entities. Specifically, we employ a customized SE-DenseNet branch to extract deep physical and visual features from high-resolution satellite imagery, along with social semantic information from auxiliary data (e.g., POIs, building footprint coverage). A semantic fusion module is further introduced to enable collaborative interaction among multi-modal and multi-scale features. The framework was validated across four Chinese cities with varying sizes, economic levels, and cultural contexts. The proposed method achieved an overall accuracy of approximately 90%, outperforming existing state-of-the-art multimodal approaches. Moreover, ablation studies conducted in three cities of different scales confirm the critical role of urban fabric in UGS recognition. Our results demonstrate that the joint modeling of visual appearance, functional attributes, and spatial semantics offers a novel and more comprehensive understanding of urban geo-scenes.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"230 ","pages":"Pages 716-737"},"PeriodicalIF":12.2000,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271625003995","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Urban fabric represents the intersection of spatial structure and social function. Analyzing its geographic components, functional semantics, and interactive relationships enables a deeper understanding of the formation and evolution of urban geo-scenes. Urban geo-scenes (UGS), as the fundamental units of urban systems, play a vital role in balancing and optimizing spatial layout, while enhancing urban resilience and vitality. Although multimodal spatial data are widely used to describe UGS, conventional approaches that rely solely on visual or social features are insufficient when addressing the complexity of modern urban systems. The spatial relationships and distributional patterns among urban elements are equally crucial for capturing the full semantic structure of urban geo-scenes. In parallel, most deep learning models still face limitations in effectively mining and fusing such diverse information. To address these challenges, we propose a multimodal deep learning framework for UGS recognition. Guided by the concepts of urban fabric and spatial co-location patterns, our method dissects the internal structure of geo-scenes and constructs a bottom-up urban fabric graph model to capture spatial semantics among geographic entities. Specifically, we employ a customized SE-DenseNet branch to extract deep physical and visual features from high-resolution satellite imagery, along with social semantic information from auxiliary data (e.g., POIs, building footprint coverage). A semantic fusion module is further introduced to enable collaborative interaction among multi-modal and multi-scale features. The framework was validated across four Chinese cities with varying sizes, economic levels, and cultural contexts. The proposed method achieved an overall accuracy of approximately 90%, outperforming existing state-of-the-art multimodal approaches. Moreover, ablation studies conducted in three cities of different scales confirm the critical role of urban fabric in UGS recognition. Our results demonstrate that the joint modeling of visual appearance, functional attributes, and spatial semantics offers a novel and more comprehensive understanding of urban geo-scenes.

查看原文本刊更多论文

剖析城市肌理的地理场景：结合人工智能、遥感数据和多模态空间语义的理解与识别

城市肌理是空间结构与社会功能的交汇点。通过分析其地理组成、功能语义和交互关系，可以更深入地理解城市地理场景的形成和演变。城市地理场景（Urban geo-scenes， UGS）作为城市系统的基本单元，在平衡和优化空间布局、增强城市韧性和活力方面发挥着重要作用。尽管多模态空间数据被广泛用于描述UGS，但仅依赖于视觉或社会特征的传统方法在处理现代城市系统的复杂性时是不够的。城市要素之间的空间关系和分布格局对于捕捉城市地理场景的完整语义结构同样至关重要。同时，大多数深度学习模型在有效挖掘和融合如此多样化的信息方面仍然面临局限性。为了解决这些挑战，我们提出了一个用于UGS识别的多模态深度学习框架。该方法以城市肌理和空间共位模式的概念为指导，剖析地理场景的内部结构，构建自下而上的城市肌理图模型，捕捉地理实体之间的空间语义。具体来说，我们使用定制的SE-DenseNet分支从高分辨率卫星图像中提取深层物理和视觉特征，以及从辅助数据（例如，poi，建筑足迹覆盖范围）中提取社会语义信息。进一步引入语义融合模块，实现多模态、多尺度特征之间的协同交互。该框架在中国四个不同规模、经济水平和文化背景的城市进行了验证。所提出的方法实现了大约90%的总体精度，优于现有的最先进的多模态方法。此外，在三个不同规模的城市进行的消融研究证实了城市结构在UGS识别中的关键作用。我们的研究结果表明，视觉外观、功能属性和空间语义的联合建模为城市地理场景提供了一种新的、更全面的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ISPRS Journal of Photogrammetry and Remote Sensing 工程技术-成像科学与照相技术

CiteScore

21.00

自引率

6.30%

发文量

273

审稿时长

40 days

期刊介绍： The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive. P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields. In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.