{"title":"基于街景影像和卫星影像的城市功能区识别统一多模态学习方法","authors":"Jiajun Chen, Runyu Fan, Hongyang Niu, Zijian Xu, Jining Yan, Weijing Song, Ruyi Feng","doi":"10.1016/j.jag.2025.104685","DOIUrl":null,"url":null,"abstract":"<div><div>Urban functional zones (UFZ) are areas that divide urban space into specific uses based on the distribution of different human activities and infrastructure. UFZ mapping is to analyze the geographic information data of urban space, combine remote sensing images (RSI), point of interest (POI) data and other data sources, and use advanced spatial analysis technology to divide and visualize the UFZ. The intelligent interpretation of UFZ can provide support for urban management and planning. Previous studies on UFZ mainly focused on using remote sensing images and POI data, which can obtain the city’s macroscopic remote sensing visual features and the distribution of land use. However, these methods often ignore the inner-street details due to the absence of using inner-street perspective data and cannot capture the complex spatial relations between objects in complex urban scenes, resulting in unsatisfied UFZ results. For this purpose, we propose a unified multimodal learning method to interpret UFZ by combining remote sensing images, POI data, and street view data with inner-street details to provide a more comprehensive perspective to boost UFZ interpretation. To make full use of the inner-street perspective advantage of street view images (SVI), we not only use their visual features but also extract textual features that can reflect various human activities in street views through image captioning technology, better to capture the subtle socio-economic activity information in urban space. We conduct extensive experiments in Wuhan, Changsha, and Nanchang. The OA of this method on the test set reached 91.80%. Experimental results show a significant improvement in the model’s performance in interpreting UFZ.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"142 ","pages":"Article 104685"},"PeriodicalIF":8.6000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A unified multimodal learning method for urban functional zone identification by fusing inner-street visual–textual information from street-view and satellite images\",\"authors\":\"Jiajun Chen, Runyu Fan, Hongyang Niu, Zijian Xu, Jining Yan, Weijing Song, Ruyi Feng\",\"doi\":\"10.1016/j.jag.2025.104685\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Urban functional zones (UFZ) are areas that divide urban space into specific uses based on the distribution of different human activities and infrastructure. UFZ mapping is to analyze the geographic information data of urban space, combine remote sensing images (RSI), point of interest (POI) data and other data sources, and use advanced spatial analysis technology to divide and visualize the UFZ. The intelligent interpretation of UFZ can provide support for urban management and planning. Previous studies on UFZ mainly focused on using remote sensing images and POI data, which can obtain the city’s macroscopic remote sensing visual features and the distribution of land use. However, these methods often ignore the inner-street details due to the absence of using inner-street perspective data and cannot capture the complex spatial relations between objects in complex urban scenes, resulting in unsatisfied UFZ results. For this purpose, we propose a unified multimodal learning method to interpret UFZ by combining remote sensing images, POI data, and street view data with inner-street details to provide a more comprehensive perspective to boost UFZ interpretation. To make full use of the inner-street perspective advantage of street view images (SVI), we not only use their visual features but also extract textual features that can reflect various human activities in street views through image captioning technology, better to capture the subtle socio-economic activity information in urban space. We conduct extensive experiments in Wuhan, Changsha, and Nanchang. The OA of this method on the test set reached 91.80%. Experimental results show a significant improvement in the model’s performance in interpreting UFZ.</div></div>\",\"PeriodicalId\":73423,\"journal\":{\"name\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"volume\":\"142 \",\"pages\":\"Article 104685\"},\"PeriodicalIF\":8.6000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of applied earth observation and geoinformation : ITC journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569843225003322\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"REMOTE SENSING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225003322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
A unified multimodal learning method for urban functional zone identification by fusing inner-street visual–textual information from street-view and satellite images
Urban functional zones (UFZ) are areas that divide urban space into specific uses based on the distribution of different human activities and infrastructure. UFZ mapping is to analyze the geographic information data of urban space, combine remote sensing images (RSI), point of interest (POI) data and other data sources, and use advanced spatial analysis technology to divide and visualize the UFZ. The intelligent interpretation of UFZ can provide support for urban management and planning. Previous studies on UFZ mainly focused on using remote sensing images and POI data, which can obtain the city’s macroscopic remote sensing visual features and the distribution of land use. However, these methods often ignore the inner-street details due to the absence of using inner-street perspective data and cannot capture the complex spatial relations between objects in complex urban scenes, resulting in unsatisfied UFZ results. For this purpose, we propose a unified multimodal learning method to interpret UFZ by combining remote sensing images, POI data, and street view data with inner-street details to provide a more comprehensive perspective to boost UFZ interpretation. To make full use of the inner-street perspective advantage of street view images (SVI), we not only use their visual features but also extract textual features that can reflect various human activities in street views through image captioning technology, better to capture the subtle socio-economic activity information in urban space. We conduct extensive experiments in Wuhan, Changsha, and Nanchang. The OA of this method on the test set reached 91.80%. Experimental results show a significant improvement in the model’s performance in interpreting UFZ.
期刊介绍:
The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.