NeSF-Net: Building roof and facade segmentation based on neighborhood relationship awareness and scale-frequency modulation network for high-resolution remote sensing images
{"title":"NeSF-Net: Building roof and facade segmentation based on neighborhood relationship awareness and scale-frequency modulation network for high-resolution remote sensing images","authors":"Yuan Zhou, Wanshou Jiang, Bin Wang","doi":"10.1016/j.isprsjprs.2025.05.025","DOIUrl":null,"url":null,"abstract":"<div><div>Building information extraction holds significant application value in smart city development, urban planning, and management. With the accelerating process of urbanization, mid- and high-rise buildings are increasingly prevalent. In orthophotos, the roofs of tall buildings often do not fully overlap with their footprints. In satellite images from oblique angles, buildings may also be obstructed or affected by shadows. Therefore, building information extraction should evolve from a roof-only extraction task to a comprehensive task that includes both roofs and facades. Current methods predominantly employ convolutional neural networks (CNNs) and Transformer models, focusing on describing building boundary and global features. However, these methods have the following limitations: insufficient utilization of information between pixels and limited spatial information recovery capabilities in decoders. This makes it difficult to distinguish between roofs and facades, and the morphological structure of buildings is challenging to maintain. To address these issues, this paper proposes a new network architecture—NeSF-Net, designed to focus on the accurate extraction of roofs and facades. NeSF-Net consists of two core modules: the neighborhood relationship awareness module (NRAM) and the scale-frequency modulation decoder (SFMD). NRAM enhances the connectivity between pixels by constructing sub-neighborhood relationship awareness in the latent space of deep features, effectively improving the integrity of the segmentation results. SFMD significantly reduces the loss of spatial information during the upsampling process by thoroughly extracting and integrating the scale and frequency features of buildings in the decoder. Experiments were conducted on the BANDON dataset, which contains images captured from oblique angles. The proposed method achieved a mIoU of 72.71 % and an F1 score of 83.04 %, outperforming state-of-the-art segmentation methods. The performance in facade extraction was particularly notable, with a mIoU score exceeding the second-best method by 4.92 %. Additionally, generalization experiments were conducted using GaoFen-7 satellite images, taking Shenzhen as a case study. The results demonstrate that the proposed method exhibits good generalization and robustness.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"226 ","pages":"Pages 247-266"},"PeriodicalIF":10.6000,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISPRS Journal of Photogrammetry and Remote Sensing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924271625002126","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Building information extraction holds significant application value in smart city development, urban planning, and management. With the accelerating process of urbanization, mid- and high-rise buildings are increasingly prevalent. In orthophotos, the roofs of tall buildings often do not fully overlap with their footprints. In satellite images from oblique angles, buildings may also be obstructed or affected by shadows. Therefore, building information extraction should evolve from a roof-only extraction task to a comprehensive task that includes both roofs and facades. Current methods predominantly employ convolutional neural networks (CNNs) and Transformer models, focusing on describing building boundary and global features. However, these methods have the following limitations: insufficient utilization of information between pixels and limited spatial information recovery capabilities in decoders. This makes it difficult to distinguish between roofs and facades, and the morphological structure of buildings is challenging to maintain. To address these issues, this paper proposes a new network architecture—NeSF-Net, designed to focus on the accurate extraction of roofs and facades. NeSF-Net consists of two core modules: the neighborhood relationship awareness module (NRAM) and the scale-frequency modulation decoder (SFMD). NRAM enhances the connectivity between pixels by constructing sub-neighborhood relationship awareness in the latent space of deep features, effectively improving the integrity of the segmentation results. SFMD significantly reduces the loss of spatial information during the upsampling process by thoroughly extracting and integrating the scale and frequency features of buildings in the decoder. Experiments were conducted on the BANDON dataset, which contains images captured from oblique angles. The proposed method achieved a mIoU of 72.71 % and an F1 score of 83.04 %, outperforming state-of-the-art segmentation methods. The performance in facade extraction was particularly notable, with a mIoU score exceeding the second-best method by 4.92 %. Additionally, generalization experiments were conducted using GaoFen-7 satellite images, taking Shenzhen as a case study. The results demonstrate that the proposed method exhibits good generalization and robustness.
期刊介绍:
The ISPRS Journal of Photogrammetry and Remote Sensing (P&RS) serves as the official journal of the International Society for Photogrammetry and Remote Sensing (ISPRS). It acts as a platform for scientists and professionals worldwide who are involved in various disciplines that utilize photogrammetry, remote sensing, spatial information systems, computer vision, and related fields. The journal aims to facilitate communication and dissemination of advancements in these disciplines, while also acting as a comprehensive source of reference and archive.
P&RS endeavors to publish high-quality, peer-reviewed research papers that are preferably original and have not been published before. These papers can cover scientific/research, technological development, or application/practical aspects. Additionally, the journal welcomes papers that are based on presentations from ISPRS meetings, as long as they are considered significant contributions to the aforementioned fields.
In particular, P&RS encourages the submission of papers that are of broad scientific interest, showcase innovative applications (especially in emerging fields), have an interdisciplinary focus, discuss topics that have received limited attention in P&RS or related journals, or explore new directions in scientific or professional realms. It is preferred that theoretical papers include practical applications, while papers focusing on systems and applications should include a theoretical background.