{"title":"CRFFNet: A cross-view reprojection based feature fusion network for fine-grained building segmentation using satellite-view and street-view data","authors":"Jinhua Yu , Junyan Ye , Yi Lin, Weijia Li","doi":"10.1016/j.inffus.2025.103795","DOIUrl":null,"url":null,"abstract":"<div><div>Fine-grained building attribute segmentation is crucial for rapidly acquiring urban geographic information and understanding urban development dynamics. To achieve a comprehensive perception of buildings, fusing cross-view data, which combines the wide coverage of satellite-view imagery with the detailed observations of street-view images, has become increasingly important. However, existing methods still struggle to effectively mitigate feature discrepancies across different views during cross-view fusion. To address this challenge, we propose the CRFFNet, a Cross-view Reprojection-based Feature Fusion Network for fine-grained building attribute segmentation. CRFFNet eliminates the perspective differences between satellite-view (satellite image and map data) and street-view features, enabling high-precision building attribute segmentation. Specifically, we introduce a deformable module to reduce target distortions in panoramic street-view images, and develop an Explicit Geometric Reprojection (EGR) module, which leverages explicit BEV geometric priors to reproject street-view features onto the satellite-view plane without requiring complex parameter inputs or depth information. To support evaluation, we construct two new datasets, Washington and Seattle, which include satellite imagery, map data, and panoramic street-view images, serving as benchmarks for cross-view, fine-grained building attribute segmentation. Extensive experiments conducted on these datasets, as well as on the public OmniCity and Brooklyn datasets, demonstrate that CRFFNet achieves mIoU improvements of 1.02% on Washington, 8.12% on Seattle, 2.29% on OmniCity, and 2.87% on Brooklyn compared to the second-best method. These improvements demonstrate the potential of our CRFFNet for applications involving large-scale multi-source data, contributing to more comprehensive urban analysis and planning.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"127 ","pages":"Article 103795"},"PeriodicalIF":15.5000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525008577","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Fine-grained building attribute segmentation is crucial for rapidly acquiring urban geographic information and understanding urban development dynamics. To achieve a comprehensive perception of buildings, fusing cross-view data, which combines the wide coverage of satellite-view imagery with the detailed observations of street-view images, has become increasingly important. However, existing methods still struggle to effectively mitigate feature discrepancies across different views during cross-view fusion. To address this challenge, we propose the CRFFNet, a Cross-view Reprojection-based Feature Fusion Network for fine-grained building attribute segmentation. CRFFNet eliminates the perspective differences between satellite-view (satellite image and map data) and street-view features, enabling high-precision building attribute segmentation. Specifically, we introduce a deformable module to reduce target distortions in panoramic street-view images, and develop an Explicit Geometric Reprojection (EGR) module, which leverages explicit BEV geometric priors to reproject street-view features onto the satellite-view plane without requiring complex parameter inputs or depth information. To support evaluation, we construct two new datasets, Washington and Seattle, which include satellite imagery, map data, and panoramic street-view images, serving as benchmarks for cross-view, fine-grained building attribute segmentation. Extensive experiments conducted on these datasets, as well as on the public OmniCity and Brooklyn datasets, demonstrate that CRFFNet achieves mIoU improvements of 1.02% on Washington, 8.12% on Seattle, 2.29% on OmniCity, and 2.87% on Brooklyn compared to the second-best method. These improvements demonstrate the potential of our CRFFNet for applications involving large-scale multi-source data, contributing to more comprehensive urban analysis and planning.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.