Xinglei Wang , Tao Cheng , Stephen Law , Zichao Zeng , Lu Yin , Junyuan Liu
{"title":"Multi-modal contrastive learning of urban space representations from POI data","authors":"Xinglei Wang , Tao Cheng , Stephen Law , Zichao Zeng , Lu Yin , Junyuan Liu","doi":"10.1016/j.compenvurbsys.2025.102299","DOIUrl":null,"url":null,"abstract":"<div><div>Understanding and characterising urban environment is crucial for urban planning and geospatial analysis. One common approach to this process is through using point of interest (POI) data, which offers rich information about the spatial-semantic characteristics of urban spaces. Existing methods for learning urban space representations from POIs face several limitations, including reliance on predefined spatial units, ignorance of POI location information, underutilisation of POI semantic attributes, and computational inefficiencies. To address these gaps, we propose CaLLiPer (<u><strong>C</strong></u>ontr<u><strong>a</strong></u>stive <u><strong>L</strong></u>anguage-<u><strong>L</strong></u>ocat<u><strong>i</strong></u>on <u><strong>P</strong></u>r<strong>e</strong>-t<u><strong>r</strong></u>aining), a novel approach that directly embeds continuous urban spaces into vector representations that capture the spatial and semantic characteristics of urban environment. This model leverages multimodal contrastive learning to align location embeddings with textual descriptions of POIs, bypassing the need for complex training corpus construction and negative sampling. Applying CaLLiPer to learning urban space representations in London, UK, we demonstrate 5–15% improvement in predictive performance for land use classification and socioeconomic mapping tasks compared to state-of-the-art methods. Visualisations and correlation analysis of the learned representations further verify our model's ability to capture spatial variations in urban semantics with high accuracy and fine resolution. Moreover, CaLLiPer achieves reduced training time, showcasing its efficiency and scalability. Additional experiments demonstrate the robustness of our model across different spatial scales and urban context. Notably, the experiment on Singapore showed an improvement of over 20%. This work also provides a promising pathway for scalable, semantically rich urban space representation learning that can support the development of geospatial foundation models. The implementation code is available at <span><span>https://github.com/xlwang233/CaLLiPer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":48241,"journal":{"name":"Computers Environment and Urban Systems","volume":"120 ","pages":"Article 102299"},"PeriodicalIF":7.1000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers Environment and Urban Systems","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0198971525000523","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL STUDIES","Score":null,"Total":0}
引用次数: 0
Abstract
Understanding and characterising urban environment is crucial for urban planning and geospatial analysis. One common approach to this process is through using point of interest (POI) data, which offers rich information about the spatial-semantic characteristics of urban spaces. Existing methods for learning urban space representations from POIs face several limitations, including reliance on predefined spatial units, ignorance of POI location information, underutilisation of POI semantic attributes, and computational inefficiencies. To address these gaps, we propose CaLLiPer (Contrastive Language-Location Pre-training), a novel approach that directly embeds continuous urban spaces into vector representations that capture the spatial and semantic characteristics of urban environment. This model leverages multimodal contrastive learning to align location embeddings with textual descriptions of POIs, bypassing the need for complex training corpus construction and negative sampling. Applying CaLLiPer to learning urban space representations in London, UK, we demonstrate 5–15% improvement in predictive performance for land use classification and socioeconomic mapping tasks compared to state-of-the-art methods. Visualisations and correlation analysis of the learned representations further verify our model's ability to capture spatial variations in urban semantics with high accuracy and fine resolution. Moreover, CaLLiPer achieves reduced training time, showcasing its efficiency and scalability. Additional experiments demonstrate the robustness of our model across different spatial scales and urban context. Notably, the experiment on Singapore showed an improvement of over 20%. This work also provides a promising pathway for scalable, semantically rich urban space representation learning that can support the development of geospatial foundation models. The implementation code is available at https://github.com/xlwang233/CaLLiPer.
期刊介绍:
Computers, Environment and Urban Systemsis an interdisciplinary journal publishing cutting-edge and innovative computer-based research on environmental and urban systems, that privileges the geospatial perspective. The journal welcomes original high quality scholarship of a theoretical, applied or technological nature, and provides a stimulating presentation of perspectives, research developments, overviews of important new technologies and uses of major computational, information-based, and visualization innovations. Applied and theoretical contributions demonstrate the scope of computer-based analysis fostering a better understanding of environmental and urban systems, their spatial scope and their dynamics.