Comprehensive urban space representation with varying numbers of street-level images

IF 7.1 1区地球科学 Q1 ENVIRONMENTAL STUDIES

Computers Environment and Urban Systems Pub Date : 2023-10-11 DOI:10.1016/j.compenvurbsys.2023.102043

Yingjing Huang , Fan Zhang , Yong Gao , Wei Tu , Fabio Duarte , Carlo Ratti , Diansheng Guo , Yu Liu

{"title":"Comprehensive urban space representation with varying numbers of street-level images","authors":"Yingjing Huang , Fan Zhang , Yong Gao , Wei Tu , Fabio Duarte , Carlo Ratti , Diansheng Guo , Yu Liu","doi":"10.1016/j.compenvurbsys.2023.102043","DOIUrl":null,"url":null,"abstract":"<div><p><span>Street-level imagery has emerged as a valuable tool for observing large-scale urban spaces with unprecedented detail. However, previous studies have been limited to analyzing individual street-level images. This approach falls short in representing the characteristics of a spatial unit, such as a street or grid, which may contain varying numbers of street-level images ranging from several to hundreds. As a result, a more comprehensive and representative approach is required to capture the complexity and diversity of urban environments at different spatial scales. To address this issue, this study proposes a deep learning-based module called Vision-LSTM, which can effectively obtain vector representation from varying numbers of street-level images in spatial units. The effectiveness of the module is validated through experiments to recognize urban villages, achieving reliable recognition results (overall accuracy: 91.6%) through multimodal learning that combines street-level imagery with remote sensing<span> imagery and social sensing data. Compared to existing image fusion methods, Vision-LSTM demonstrates significant effectiveness in capturing associations between street-level images. The proposed module can provide a more comprehensive understanding of urban spaces, enhancing the research value of street-level imagery and facilitating multimodal learning-based urban research. Our models are available at </span></span><span>https://github.com/yingjinghuang/Vision-LSTM</span><svg><path></path></svg>.</p></div>","PeriodicalId":48241,"journal":{"name":"Computers Environment and Urban Systems","volume":"106 ","pages":"Article 102043"},"PeriodicalIF":7.1000,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers Environment and Urban Systems","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0198971523001060","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL STUDIES","Score":null,"Total":0}

引用次数: 0

Abstract

Street-level imagery has emerged as a valuable tool for observing large-scale urban spaces with unprecedented detail. However, previous studies have been limited to analyzing individual street-level images. This approach falls short in representing the characteristics of a spatial unit, such as a street or grid, which may contain varying numbers of street-level images ranging from several to hundreds. As a result, a more comprehensive and representative approach is required to capture the complexity and diversity of urban environments at different spatial scales. To address this issue, this study proposes a deep learning-based module called Vision-LSTM, which can effectively obtain vector representation from varying numbers of street-level images in spatial units. The effectiveness of the module is validated through experiments to recognize urban villages, achieving reliable recognition results (overall accuracy: 91.6%) through multimodal learning that combines street-level imagery with remote sensing imagery and social sensing data. Compared to existing image fusion methods, Vision-LSTM demonstrates significant effectiveness in capturing associations between street-level images. The proposed module can provide a more comprehensive understanding of urban spaces, enhancing the research value of street-level imagery and facilitating multimodal learning-based urban research. Our models are available at https://github.com/yingjinghuang/Vision-LSTM.

查看原文本刊更多论文

以不同数量的街道级图像综合呈现城市空间

街道级图像已经成为一种有价值的工具，以前所未有的细节观察大规模城市空间。然而，之前的研究仅限于分析单个街道图像。这种方法在表示空间单元(如街道或网格)的特征方面存在不足，这些空间单元可能包含不同数量的街道级图像，从几个到数百个不等。因此，需要一种更全面、更具代表性的方法来捕捉不同空间尺度下城市环境的复杂性和多样性。为了解决这个问题，本研究提出了一个基于深度学习的模块Vision-LSTM，该模块可以有效地从空间单元中不同数量的街道级图像中获得向量表示。通过实验验证了该模块识别城中村的有效性，通过街景影像与遥感影像、社会遥感数据相结合的多模态学习，获得了可靠的识别结果(总体准确率为91.6%)。与现有的图像融合方法相比，Vision-LSTM在捕获街道图像之间的关联方面表现出显著的有效性。该模块可以提供对城市空间更全面的理解，增强街道级图像的研究价值，促进基于多模式学习的城市研究。我们的模型可以在https://github.com/yingjinghuang/Vision-LSTM上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers Environment and Urban Systems Multiple-

CiteScore

13.30

自引率

7.40%

发文量

111

审稿时长

32 days

期刊介绍： Computers, Environment and Urban Systemsis an interdisciplinary journal publishing cutting-edge and innovative computer-based research on environmental and urban systems, that privileges the geospatial perspective. The journal welcomes original high quality scholarship of a theoretical, applied or technological nature, and provides a stimulating presentation of perspectives, research developments, overviews of important new technologies and uses of major computational, information-based, and visualization innovations. Applied and theoretical contributions demonstrate the scope of computer-based analysis fostering a better understanding of environmental and urban systems, their spatial scope and their dynamics.