{"title":"Multi-level urban street representation with street-view imagery and hybrid semantic graph","authors":"Yan Zhang , Yong Li , Fan Zhang","doi":"10.1016/j.isprsjprs.2024.09.032","DOIUrl":"10.1016/j.isprsjprs.2024.09.032","url":null,"abstract":"<div><div>Street-view imagery has been densely covering cities. They provide a close-up perspective of the urban physical environment, allowing a comprehensive perception and understanding of cities. There has been a significant amount of effort to represent the urban physical environment based on street view imagery, and this representation has been utilized to study the relationships between the physical environment, human dynamics, and socioeconomic environments. However, there are two key challenges in representing the urban physical environment of streets based on street-view images for downstream tasks. First, current research mainly focuses on the proportions of visual elements within the scene, neglecting the spatial adjacency between them. Second, the spatial dependency and spatial interaction between streets have not been adequately accounted for. These limitations hinder the effective representation and understanding of urban streets. To address these challenges, we propose a dynamic graph representation framework based on dual spatial semantics. At the intra-street level, we consider the spatial adjacency relationships of visual elements. Our method dynamically parses visual elements within the scene, achieving context-specific representations. At the inter-street level, we construct two spatial weight matrices by integrating the spatial dependency and the spatial interaction relationships. It could account for the hybrid spatial relationships between streets comprehensively, enhancing the model’s ability to represent human dynamics and socioeconomic status. Furthermore, aside from these two modules, we also provide a spatial interpretability analysis tool for downstream tasks. A case study of our research framework shows that our method improves vehicle speed and flow estimation by 2.4% and 6.4%, respectively. This not only indicates that street-view imagery provides rich information about urban transportation but also offers a more accurate and reliable data-driven framework for urban studies. The code is available at: (<span><span>https://github.com/yemanzhongting/HybridGraph</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 19-32"},"PeriodicalIF":10.6,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PolyR-CNN: R-CNN for end-to-end polygonal building outline extraction","authors":"Weiqin Jiao, Claudio Persello, George Vosselman","doi":"10.1016/j.isprsjprs.2024.10.006","DOIUrl":"10.1016/j.isprsjprs.2024.10.006","url":null,"abstract":"<div><div>Polygonal building outline extraction has been a research focus in recent years. Most existing methods have addressed this challenging task by decomposing it into several subtasks and employing carefully designed architectures. Despite their accuracy, such pipelines often introduce inefficiencies during training and inference. This paper presents an end-to-end framework, denoted as PolyR-CNN, which offers an efficient and fully integrated approach to predict vectorized building polygons and bounding boxes directly from remotely sensed images. Notably, PolyR-CNN leverages solely the features of the Region of Interest (RoI) for the prediction, thereby mitigating the necessity for complex designs. Furthermore, we propose a novel scheme with PolyR-CNN to extract detailed outline information from polygon vertex coordinates, termed vertex proposal feature, to guide the RoI features to predict more regular buildings. PolyR-CNN demonstrates the capacity to deal with buildings with holes through a simple post-processing method on the Inria dataset. Comprehensive experiments conducted on the CrowdAI dataset show that PolyR-CNN achieves competitive accuracy compared to state-of-the-art methods while significantly improving computational efficiency, i.e., achieving 79.2 Average Precision (AP), exhibiting a 15.9 AP gain and operating 2.5 times faster and four times lighter than the well-established end-to-end method PolyWorld. Replacing the backbone with a simple ResNet-50, PolyR-CNN maintains a 71.1 AP while running four times faster than PolyWorld. The code is available at: <span><span>https://github.com/HeinzJiao/PolyR-CNN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 33-43"},"PeriodicalIF":10.6,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lirong Han , Mercedes E. Paoletti , Sergio Moreno-Álvarez , Juan M. Haut , Antonio Plaza
{"title":"Deep shared proxy construction hashing for cross-modal remote sensing image fast target retrieval","authors":"Lirong Han , Mercedes E. Paoletti , Sergio Moreno-Álvarez , Juan M. Haut , Antonio Plaza","doi":"10.1016/j.isprsjprs.2024.10.004","DOIUrl":"10.1016/j.isprsjprs.2024.10.004","url":null,"abstract":"<div><div>The diversity of remote sensing (RS) image modalities has expanded alongside advancements in RS technologies. A plethora of optical, multispectral, and hyperspectral RS images offer rich geographic class information. The ability to swiftly access multiple RS image modalities is crucial for fully harnessing the potential of RS imagery. In this work, an innovative method, called Deep Shared Proxy Construction Hashing (<span>DSPCH</span>), is introduced for cross-modal hyperspectral scene target retrieval using accessible RS images such as optical and sketch. Initially, a shared proxy hash code is generated in the hash space for each land use class. Subsequently, an end-to-end deep hash network is built to generate hash codes for hyperspectral pixels and accessible RS images. Furthermore, a proxy hash loss function is designed to optimize the proposed deep hashing network, aiming to generate hash codes that closely resemble the corresponding proxy hash code. Finally, two benchmark datasets are established for cross-modal hyperspectral and accessible RS image retrieval, allowing us to conduct extensive experiments with these datasets. Our experimental results validate that the novel <span>DSPCH</span> method can efficiently and effectively achieve RS image cross-modal target retrieval, opening up new avenues in the field of cross-modal RS image retrieval.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 44-56"},"PeriodicalIF":10.6,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saeedeh Shahbazi , Anna Barra , Qi Gao , Michele Crosetto
{"title":"Detection of buildings with potential damage using differential deformation maps","authors":"Saeedeh Shahbazi , Anna Barra , Qi Gao , Michele Crosetto","doi":"10.1016/j.isprsjprs.2024.10.008","DOIUrl":"10.1016/j.isprsjprs.2024.10.008","url":null,"abstract":"<div><div>The European Ground Motion Service (EGMS) is a crucial component of the systematic monitoring and quantification of land displacement across Europe. By using Sentinel-1 full-resolution images, EGMS offers a reliable, consistent, and annually updated dataset for detecting natural and anthropogenic ground motion phenomena. While the Copernicus platform grants free accessibility to EGMS displacement maps containing a massive number of measurement points, the challenge lies in finding appropriate methodologies and tools to make this wealth of information easily exploitable, especially to non-experts. This study leverages the EGMS displacement maps to display the capability of a novel software tool designed for the automatic identification of urban structures that may be susceptible to damage due to differential movements. This approach is notable since it can be applied to areas of any size, from local to national scale, and while designed for EGMS data, it can be applied to any other InSAR-derived displacement maps, regardless of the data source. Despite prior researches on differential settlements, this study concentrates on analysing every single buildings and computing the spatial gradient of deformation by using Measurement Points specifically associated with each building. The designed software tool quickly analyses and detects at-risk buildings, applying a classification system that categorizes them based on the severity of their spatial deformation gradient, with uncertainty as the primary factor, and generates a differential deformation map as a result. The Monte Carlo simulations assisted us in estimating the standard deviation of the spatial gradient, which was determined to be 0.05 mm×yr<sup>−1</sup>×m<sup>−1</sup>. The approach was tested with EGMS data over the area of Barcelona Municipality (Spain) from 2017 to 2021. The complete source code is available at <span><span><u>https://github.com/saeedehshahbazi/detecting-differential-deformation.git</u></span><svg><path></path></svg></span>. In addition, a COSMO-SkyMed dataset was used to assess and appraise its performance. The number of detected buildings was 149 in the EGMS dataset and 155 in the COSMO-SkyMed dataset, with approximately 50% of the buildings classified as Very-Low/Low in both datasets. The results demonstrate the technique’s robustness, as it yielded consistent outcomes using distinct datasets. To verify the differential deformation map, a field survey was conducted to observe any evident structural damage (e.g., cracks or fractures). The differential deformation maps may represent a primary source of information for conducting a comprehensive analysis and vulnerability/risk assessment in urban areas. The effectiveness of this technique, as evidenced by consistent outcomes with diverse datasets, underlines its value in improving our understanding of structural vulnerabilities, thus contributing to informed urban management.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 57-69"},"PeriodicalIF":10.6,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142534854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Huang, Anton Vrieling, Yue Dou, Mariana Belgiu, Andrew Nelson
{"title":"A robust method for mapping soybean by phenological aligning of Sentinel-2 time series","authors":"Xin Huang, Anton Vrieling, Yue Dou, Mariana Belgiu, Andrew Nelson","doi":"10.1016/j.isprsjprs.2024.10.015","DOIUrl":"10.1016/j.isprsjprs.2024.10.015","url":null,"abstract":"<div><div>Soybean is an important crop for food and animal feed. Production and area both continue to increase and expand into new areas and countries. Spatially explicit information on soybean cultivation is essential to crop monitoring, production estimation, and national accounting systems. However, its cultivation in diverse climate conditions, landscapes, and agricultural systems poses challenges to accurately map soybean across different regions and years. We propose an innovative soybean mapping method combining phenological alignment with machine learning (named here RF-DTW), which can be applied to diverse geographies and years by aligning phenological shifts and using distinctive features from Sentinel-2 time-series. The method first uses the dynamic time warping (DTW) algorithm to align the growing season between pixels across different sites. Then, based on the harmonized time-series, a set of distinctive features was identified and used to build random forest (RF) models to classify soybean across ten globally distributed sites and multiple years. Results show that the green chlorophyll vegetation index (GCVI), greenness and water content composite index (GWCCI), normalized difference senescent vegetation index (NDSVI), red edge position (REP), and short-wave infrared bands are important inputs for distinguishing soybean from other crops. Spectral-phenological features, particularly the curve slope metrics of GCVI and GWCCI during the peak to late growing season, rank as the most important features for mapping soybean. RF-DTW demonstrates good generalizability across ten study sites, achieving an overall accuracy (OA) of 0.92 and an F1-score of 0.84. F1-scores for eight out of ten sites ranged between 0.82 and 0.98, outperforming a benchmark method, although they were lower (F1-score < 0.60) for the two sites in Sub-Saharan Africa. Additionally, RF-DTW performs robustly when transferred to untrained regions and years, with most cases showing an F1-score higher than 0.70. Our proposed method, as a combination of phenological alignment and machine learning, can be used to map soybean accurately and efficiently across different regions and years, to provide crucial information for understanding the rapid dynamics of soybean cultivation and its global-scale impacts.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 1-18"},"PeriodicalIF":10.6,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhijia Zheng , Xiuyuan Zhang , Jiajun Li , Eslam Ali , Jinsongdi Yu , Shihong Du
{"title":"Global perspectives on sand dune patterns: Scale-adaptable classification using Landsat imagery and deep learning strategies","authors":"Zhijia Zheng , Xiuyuan Zhang , Jiajun Li , Eslam Ali , Jinsongdi Yu , Shihong Du","doi":"10.1016/j.isprsjprs.2024.10.002","DOIUrl":"10.1016/j.isprsjprs.2024.10.002","url":null,"abstract":"<div><div>Sand dune patterns (SDPs) are spatial aggregations of dunes and interdunes, exhibiting distinct morphologies and spatial structures. Recognizing global SDPs is crucial for understanding the development processes, contributing factors, and self-organization characteristics of aeolian systems. However, the diversity, complexity, and multiscale nature of global SDPs poses significant technical challenges in the classification scheme, sample collection, feature representation, and classification method. This study addresses these challenges by developing a novel global SDP classification approach based on an advanced deep-learning network. Firstly, we established a globally applicable SDP classification scheme that accommodates the diversity nature of SDPs. Secondly, we developed an SDP semantic segmentation sample dataset, which encompassed a wide array of SDP representations. Thirdly, we deployed the SegFormer network to automatically capture detailed dune structures and developed a weighted voting strategy to ensure scale adaptability. Experiments utilizing Landsat-8 imagery yielded a commendable overall accuracy (OA) of 85.43 %. Notably, most SDP types exhibited high classification accuracies, such as star dunes (97.43 %) and simple linear dunes (87.17 %). The weighted voting strategy prioritized the predictions of each type, resulting in a 1.41 %∼7.91 % improvement in OA compared to the single-scale classification and average voting methods. This innovative approach facilitated the generation of a high-quality, fine-grained, and global-scale SDP map at 30 m resolution (GSDP30), which not only directly provides the spatial distribution of global SDPs but also serves as valuable support for understanding aeolian processes. This study represents the first instance of producing such a comprehensive and globally applicable SDP map at this fine resolution.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 781-801"},"PeriodicalIF":10.6,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142442945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tian Ke , Yanfei Zhong , Mi Song , Xinyu Wang , Liangpei Zhang
{"title":"Mineral detection based on hyperspectral remote sensing imagery on Mars: From detection methods to fine mapping","authors":"Tian Ke , Yanfei Zhong , Mi Song , Xinyu Wang , Liangpei Zhang","doi":"10.1016/j.isprsjprs.2024.09.020","DOIUrl":"10.1016/j.isprsjprs.2024.09.020","url":null,"abstract":"<div><div>Hyperspectral remote sensing is a commonly used technical means for mineral detection on the Martian surface, which has important implications for the study of Martian geological evolution and the study for potential biological signatures. The increasing volume of Martian remote sensing data and complex issues such as the intimate mixture of Martian minerals make research on Martian mineral detection challenging. This paper summarizes the existing achievements by analyzing the papers published in recent years and looks forward to the future research directions. Specifically, this paper introduces the currently used hyperspectral remote sensing data of Mars and systematically analyzes the characteristics and distribution of Martian minerals. The existing methods are then divided into two groups, according to their core idea, i.e., methods based on pixels and methods based on subpixels. In addition, some applications of Martian mineral detection at global and local scales are analyzed. Furthermore, the various typical methods are compared using synthetic and real data to assess their performance. The conclusion is drawn that approach based on spectral unmixing is more applicable to areas with limited and unknown mineral categories than pixel-based methods. Among them, the fully autonomous hyperspectral unmixing method can improve the overall accuracy in real CRISM images and has great potential for Martian mineral detection. The development trends are analyzed from three aspects. Firstly, in terms of data, a more complete spectral library, covering more spectral information of the Martian surface minerals, should be constructed to assist with mineral detection. Secondly, in terms of methods, spectral unmixing methods based on a nonlinear mixing model and a new generation of data-driven detection paradigms guided by Mars mineral knowledge should be developed. Finally, in terms of application, the global mapping of Martian minerals toward a more intelligent, global scale, and refined direction should be targeted in the future. The data and source code in the experiment are available at <span><span>http://rsidea.whu.edu.cn/Martian_mineral_detection.htm</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 761-780"},"PeriodicalIF":10.6,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pengfei Tang , Shanchuan Guo , Peng Zhang , Lu Qie , Xiaoquan Pan , Jocelyn Chanussot , Peijun Du
{"title":"A highly efficient index for robust mapping of tidal flats from sentinel-2 images directly","authors":"Pengfei Tang , Shanchuan Guo , Peng Zhang , Lu Qie , Xiaoquan Pan , Jocelyn Chanussot , Peijun Du","doi":"10.1016/j.isprsjprs.2024.10.005","DOIUrl":"10.1016/j.isprsjprs.2024.10.005","url":null,"abstract":"<div><div>As an essential component of the intertidal zone, tidal flats (TFs) are areas rich in resources where with the most intense material and energy exchanges. However, due to the dual threats of human activities and extreme climate conditions, TFs are disappearing on a large scale. Despite their importance, accurately mapping TFs has proved challenging due to their complex and dynamic nature. Nevertheless, Tidal influences significantly enhance the diversity and variability of TFs, and suspended particulates introduce turbidity that challenges conventional indices used for distinguishing between water and land. This study focuses on the world’s largest intertidal sedimentary system located along the central coast of Jiangsu, an area characterized by complex sedimentary features and dynamic TF conditions. Through quantitative analysis of the spectral characteristics of TFs at different years, seasons, and tidal stages, this study identifies two unique spectral features of TFs: uniformly low reflectance values and a trapezoidal spectral shape. Leveraging the low reflectance, the flatness of the middle segment in the trapezoidal spectral shape, and the initial increase followed by a decreasing trend across critical bands, a novel Tidal Flat Index (TFI) has been developed. Experimental results indicate that TFI is suitable for robust and direct TF mapping across years, seasons, and tidal stages, achieving F1 scores exceeding 0.95 in 12 different scenarios. Compared to other indices and rule-based methods, TFI offers greater accuracy, threshold stability, background and cloud suppression. The study also extends to other globally rich TFs regions to demonstrate the universality and applicability of the proposed index in various environments, including its effectiveness in delineating annual TFs extents. This study offers technical support for the automatic mapping of TFs based on single Sentinel-2 multispectral images.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 742-760"},"PeriodicalIF":10.6,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simon Treier , Juan M. Herrera , Andreas Hund , Norbert Kirchgessner , Helge Aasen , Achim Walter , Lukas Roth
{"title":"Improving drone-based uncalibrated estimates of wheat canopy temperature in plot experiments by accounting for confounding factors in a multi-view analysis","authors":"Simon Treier , Juan M. Herrera , Andreas Hund , Norbert Kirchgessner , Helge Aasen , Achim Walter , Lukas Roth","doi":"10.1016/j.isprsjprs.2024.09.015","DOIUrl":"10.1016/j.isprsjprs.2024.09.015","url":null,"abstract":"<div><div>Canopy temperature (CT) is an integrative trait, indicative of the relative fitness of a plant genotype to the environment. Lower CT is associated with higher yield, biomass and generally a higher performing genotype. In view of changing climatic conditions, measuring CT is becoming increasingly important in breeding and variety testing. Ideally, CTs should be measured as simultaneously as possible in all genotypes to avoid any bias resulting from changes in environmental conditions. The use of thermal cameras mounted on drones allows to measure large experiments in a short time. Uncooled thermal cameras are sufficiently lightweight to be mounted on drones. However, such cameras are prone to thermal drift, where the measured temperature changes with the conditions the sensor is exposed to. Thermal drift and changing environmental conditions impede precise and consistent thermal measurements with uncooled cameras. Furthermore, the viewing geometry of images affects the ratio between pixels showing soil or plants. Particularly for row crops such as wheat, changing viewing geometries will increase CT uncertainties. Restricting the range of viewing geometries can potentially reduce these effects. In this study, sequences of repeated thermal images were analyzed in a multi-view approach which allowed to extract information on trigger timing and viewing geometry for individual measurements. We propose a mixed model approach that can account for temporal drift and viewing geometry by including temporal and geometric covariates. This approach allowed to improve consistency and genotype specificity of CT measurements compared to approaches relying on orthomosaics in a two-year field variety testing trial with winter wheat. The correlations between independent measurements taken within 20<!--> <!-->min reached 0.99, and heritabilities 0.95. Selecting measurements with oblique viewing geometries for analysis can reduce the influence of soil background. The proposed workflow provides a lean phenotyping method to collect high-quality CT measurements in terms of ranking consistency and heritability with an affordable thermal camera by incorporating available additional information from drone-based mapping flights in a post-processing step.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 721-741"},"PeriodicalIF":10.6,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenhao Wang , Jingbo Chen , Yu Meng , Yupeng Deng , Kai Li , Yunlong Kong
{"title":"SAMPolyBuild: Adapting the Segment Anything Model for polygonal building extraction","authors":"Chenhao Wang , Jingbo Chen , Yu Meng , Yupeng Deng , Kai Li , Yunlong Kong","doi":"10.1016/j.isprsjprs.2024.09.018","DOIUrl":"10.1016/j.isprsjprs.2024.09.018","url":null,"abstract":"<div><div>Extracting polygonal buildings from high-resolution remote sensing images is a critical task for large-scale mapping, 3D city modeling, and various geographic information system applications. Traditional methods are often restricted in accurately delineating boundaries and exhibit limited generalizability, which can affect their real-world applicability. The Segment Anything Model (SAM), a promptable segmentation model trained on an unprecedentedly large dataset, demonstrates remarkable generalization ability across various scenarios. In this context, we present SAMPolyBuild, an innovative framework that adapts SAM for polygonal building extraction, allowing for both automatic and prompt-based extraction. To fulfill the requirement for object location prompts in SAM, we developed the Auto Bbox Prompter, which is trained to detect building bounding boxes directly from the image encoder features of the SAM. The boundary precision of the SAM mask results was insufficient for vector polygon extraction, especially when challenged by blurry edges and tree occlusions. Therefore, we extended the SAM decoder with additional parameters to enable multitask learning to predict masks and generate Gaussian vertex and boundary maps simultaneously. Furthermore, we developed a mask-guided vertex connection algorithm to generate the final polygon. Extensive evaluation on the WHU-Mix vector dataset and SpaceNet datasets demonstrate that our method achieves a new state-of-the-art in terms of accuracy and generalizability, significantly improving average precision (AP), average recall (AR), intersection over union (IoU), boundary F1, and vertex F1 metrics. Moreover, by combining the automatic and prompt modes of our framework, we found that 91.2% of the building polygons predicted by SAMPolyBuild on out-of-domain data closely match the quality of manually delineated polygons. The source code is available at <span><span>https://github.com/wchh-2000/SAMPolyBuild</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"218 ","pages":"Pages 707-720"},"PeriodicalIF":10.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142426649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}