Comprehensive urban space representation with varying numbers of street-level images

IF 7.1 1区 地球科学 Q1 ENVIRONMENTAL STUDIES
Yingjing Huang , Fan Zhang , Yong Gao , Wei Tu , Fabio Duarte , Carlo Ratti , Diansheng Guo , Yu Liu
{"title":"Comprehensive urban space representation with varying numbers of street-level images","authors":"Yingjing Huang ,&nbsp;Fan Zhang ,&nbsp;Yong Gao ,&nbsp;Wei Tu ,&nbsp;Fabio Duarte ,&nbsp;Carlo Ratti ,&nbsp;Diansheng Guo ,&nbsp;Yu Liu","doi":"10.1016/j.compenvurbsys.2023.102043","DOIUrl":null,"url":null,"abstract":"<div><p><span>Street-level imagery has emerged as a valuable tool for observing large-scale urban spaces with unprecedented detail. However, previous studies have been limited to analyzing individual street-level images. This approach falls short in representing the characteristics of a spatial unit, such as a street or grid, which may contain varying numbers of street-level images ranging from several to hundreds. As a result, a more comprehensive and representative approach is required to capture the complexity and diversity of urban environments at different spatial scales. To address this issue, this study proposes a deep learning-based module called Vision-LSTM, which can effectively obtain vector representation from varying numbers of street-level images in spatial units. The effectiveness of the module is validated through experiments to recognize urban villages, achieving reliable recognition results (overall accuracy: 91.6%) through multimodal learning that combines street-level imagery with remote sensing<span> imagery and social sensing data. Compared to existing image fusion methods, Vision-LSTM demonstrates significant effectiveness in capturing associations between street-level images. The proposed module can provide a more comprehensive understanding of urban spaces, enhancing the research value of street-level imagery and facilitating multimodal learning-based urban research. Our models are available at </span></span><span>https://github.com/yingjinghuang/Vision-LSTM</span><svg><path></path></svg>.</p></div>","PeriodicalId":48241,"journal":{"name":"Computers Environment and Urban Systems","volume":"106 ","pages":"Article 102043"},"PeriodicalIF":7.1000,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers Environment and Urban Systems","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0198971523001060","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL STUDIES","Score":null,"Total":0}
引用次数: 0

Abstract

Street-level imagery has emerged as a valuable tool for observing large-scale urban spaces with unprecedented detail. However, previous studies have been limited to analyzing individual street-level images. This approach falls short in representing the characteristics of a spatial unit, such as a street or grid, which may contain varying numbers of street-level images ranging from several to hundreds. As a result, a more comprehensive and representative approach is required to capture the complexity and diversity of urban environments at different spatial scales. To address this issue, this study proposes a deep learning-based module called Vision-LSTM, which can effectively obtain vector representation from varying numbers of street-level images in spatial units. The effectiveness of the module is validated through experiments to recognize urban villages, achieving reliable recognition results (overall accuracy: 91.6%) through multimodal learning that combines street-level imagery with remote sensing imagery and social sensing data. Compared to existing image fusion methods, Vision-LSTM demonstrates significant effectiveness in capturing associations between street-level images. The proposed module can provide a more comprehensive understanding of urban spaces, enhancing the research value of street-level imagery and facilitating multimodal learning-based urban research. Our models are available at https://github.com/yingjinghuang/Vision-LSTM.

以不同数量的街道级图像综合呈现城市空间
街道级图像已经成为一种有价值的工具,以前所未有的细节观察大规模城市空间。然而,之前的研究仅限于分析单个街道图像。这种方法在表示空间单元(如街道或网格)的特征方面存在不足,这些空间单元可能包含不同数量的街道级图像,从几个到数百个不等。因此,需要一种更全面、更具代表性的方法来捕捉不同空间尺度下城市环境的复杂性和多样性。为了解决这个问题,本研究提出了一个基于深度学习的模块Vision-LSTM,该模块可以有效地从空间单元中不同数量的街道级图像中获得向量表示。通过实验验证了该模块识别城中村的有效性,通过街景影像与遥感影像、社会遥感数据相结合的多模态学习,获得了可靠的识别结果(总体准确率为91.6%)。与现有的图像融合方法相比,Vision-LSTM在捕获街道图像之间的关联方面表现出显著的有效性。该模块可以提供对城市空间更全面的理解,增强街道级图像的研究价值,促进基于多模式学习的城市研究。我们的模型可以在https://github.com/yingjinghuang/Vision-LSTM上找到。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
13.30
自引率
7.40%
发文量
111
审稿时长
32 days
期刊介绍: Computers, Environment and Urban Systemsis an interdisciplinary journal publishing cutting-edge and innovative computer-based research on environmental and urban systems, that privileges the geospatial perspective. The journal welcomes original high quality scholarship of a theoretical, applied or technological nature, and provides a stimulating presentation of perspectives, research developments, overviews of important new technologies and uses of major computational, information-based, and visualization innovations. Applied and theoretical contributions demonstrate the scope of computer-based analysis fostering a better understanding of environmental and urban systems, their spatial scope and their dynamics.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信