Nan Zhang , Borui Chai , Jiamin Song , Tian Tian , Pengfei Zhu , Jiayi Ma , Jinwen Tian
{"title":"Omni-Scene Infrared Vehicle Detection: An Efficient Selective Aggregation approach and a unified benchmark","authors":"Nan Zhang , Borui Chai , Jiamin Song , Tian Tian , Pengfei Zhu , Jiayi Ma , Jinwen Tian","doi":"10.1016/j.isprsjprs.2025.03.002","DOIUrl":"10.1016/j.isprsjprs.2025.03.002","url":null,"abstract":"<div><div>Vehicle detection in infrared aerial imagery is essential for military and civilian applications due to its effectiveness in low-light and adverse scenarios. However, the low spectral and pixel resolution of long-wave infrared (LWIR) results in limited information compared to visible light, causing significant background interference. Moreover, varying thermal radiation from vehicle movement and environmental factors creates diverse vehicle patterns, complicating accurate detection and recognition. To address these challenges, we propose the Omni-Scene Infrared Vehicle Detection Network (OSIV-Net), a framework optimized for scene adaptability in infrared vehicle detection. The core architecture of OSIV-Net employs Efficient Selective Aggregation Blocks (ESABlocks), combining Anchor-Adaptive Convolution (A<sup>2</sup>Conv) in shallow layers and the Magic Cube Module (MCM) in deeper layers to accurately capture and selectively aggregate features. A<sup>2</sup>Conv captures the local intrinsic and variable patterns of vehicles by combining differential and dynamic convolutions, while MCM flexibly integrates global features from three dimensions of the feature map. In addition, we constructed the Omni-Scene Infrared Vehicle (OSIV) dataset, the most comprehensive infrared aerial vehicle dataset to date, with 39,583 images spanning nine distinct scenes and over 617,000 annotated vehicle instances across five categories, providing a robust benchmark for advancing infrared vehicle detection across varied environments. Experimental results on the DroneVehicle and OSIV datasets demonstrate that OSIV-Net achieves state-of-the-art (SOTA) performance and outperforms across various scenarios. Specifically, it attains 82.60% [email protected] on the DroneVehicle dataset, surpassing the previous infrared modality SOTA method DTNet by +4.27% and the multi-modal SOTA method MGMF by +2.3%. On the OSIV dataset, it attains an average performance of 78.14% across all scenarios, outperforming DTNet by +6.13%. The dataset and code can be downloaded from <span><span>https://github.com/rslab1111/OSIV</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 244-260"},"PeriodicalIF":10.6,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"InSAR estimates of excess ground ice concentrations near the permafrost table","authors":"S. Zwieback , G. Iwahana , Q. Chang , F. Meyer","doi":"10.1016/j.isprsjprs.2025.03.004","DOIUrl":"10.1016/j.isprsjprs.2025.03.004","url":null,"abstract":"<div><div>Ground ice melt can reshape permafrost environments, with repercussions for Northern livelihoods and infrastructure. However, fine-scale permafrost ground ice products are lacking, limiting environmental change predictions. We propose an InSAR-based approach for estimating ground ice near the permafrost table in sparsely vegetated terrain underlain by continuous permafrost. The Bayesian inversion retrieves ice content by matching the subsidence predicted by a forward model to InSAR observations, accounting for atmospheric, decorrelation, and model parameter uncertainty. We specifically estimate the excess ice concentration of materials that thaw at the end of summer; in summers with deep thaw, these materials overlap with the previous years’ upper permafrost. In a very warm summer in Northwestern Alaska, Sentinel-1 retrievals showed average excess ice concentrations of, respectively, 0.4 and 0.0 in locations independently determined to be ice-rich and ice-poor. In ice-rich locations, the estimates were lower in the preceding warm summer, indicating the thaw front rarely penetrated deep into the ice-rich intermediate layer. Performance was sensitive to the density of stable reference points for atmospheric correction, with deviations of up to 0.3 and increased uncertainty when fewer reference points were used. Toward filling gaps and mitigating InSAR retrieval errors far from reference points, we determined the predictability of the InSAR ice concentrations from topographic and optical surface proxies, finding a moderate <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> of 0.6, with slope being the most important predictor. In summary, the InSAR inversion provides quantitative ice concentration estimates near the permafrost table independent of surface manifestations of ground ice, in-situ observations and geological information. Its combination with optical remote sensing and geological information has the potential to provide seamless, fine-scale permafrost ground ice products.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 261-273"},"PeriodicalIF":10.6,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhendong Liu , Liang Zhai , Jie Yin , Xiaoli Liu , Shilong Zhang , Dongyang Wang , Abbas Rajabifard , Yiqun Chen
{"title":"TRSP: Texture reconstruction algorithm driven by prior knowledge of ground object types","authors":"Zhendong Liu , Liang Zhai , Jie Yin , Xiaoli Liu , Shilong Zhang , Dongyang Wang , Abbas Rajabifard , Yiqun Chen","doi":"10.1016/j.isprsjprs.2025.03.015","DOIUrl":"10.1016/j.isprsjprs.2025.03.015","url":null,"abstract":"<div><div>The texture reconstruction algorithm uses multiview images and 3D geometric surface models as data sources to establish the mapping relationship and texture consistency constraints between 2D images and 3D geometric surfaces to produce a 3D surface model with color reality. The existing algorithms still have challenges in terms of texture quality when faced with dynamic scenes with complex outdoor features and different lighting environments. In this paper, a texture reconstruction algorithm driven by prior knowledge of ground object types is proposed. First, a multiscale and multifactor joint screening strategy is constructed to generate sparse key scenes of occlusion perception. Second, globally consistent 3D semantic mapping rules and semantic similarity measures are proposed. The multiview 2D image semantic segmentation results are refined, fused, and mapped into 3D semantic category information. Then, the 3D model semantic information is introduced to construct the energy function of the prior knowledge of the ground objects, and the color of the texture block boundary is adjusted. Experimental verification and analysis are conducted using public and actual datasets. Compared with famous algorithms such as Allene, Waechter, and OpenMVS, the core indicators of texture quality of the proposed algorithm are effectively reduced by 57.14 %, 53.24 % and 50.69 %, and it performs best in terms of clarity and contrast of texture details; the effective culling rate of moving objects is about 80 %–88.9 %, the texture mapping is cleaner and the redundant calculation is significantly reduced.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 221-243"},"PeriodicalIF":10.6,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Bi , Beichen Zhou , Jingjun Yi , Wei Ji , Haolan Zhan , Gui-Song Xia
{"title":"GOOD: Towards domain generalized oriented object detection","authors":"Qi Bi , Beichen Zhou , Jingjun Yi , Wei Ji , Haolan Zhan , Gui-Song Xia","doi":"10.1016/j.isprsjprs.2025.02.025","DOIUrl":"10.1016/j.isprsjprs.2025.02.025","url":null,"abstract":"<div><div>Oriented object detection has been rapidly developed in the past few years, but most of these methods assume the training and testing images are under the same statistical distribution, which is far from reality. In this paper, we propose the task of domain generalized oriented object detection, which intends to explore the generalization of oriented object detectors on arbitrary unseen target domains. Learning domain generalized oriented object detectors is particularly challenging, as the cross-domain style variation not only negatively impacts the content representation, but also leads to unreliable orientation predictions. To address these challenges, we propose a generalized oriented object detector (GOOD). After style hallucination by the emerging contrastive language-image pre-training (CLIP), it consists of two key components, namely, rotation-aware content consistency learning (RAC) and style consistency learning (SEC). The proposed RAC allows the oriented object detector to learn stable orientation representation from style-diversified samples. The proposed SEC further stabilizes the generalization ability of content representation from different image styles. Notably, both learning objectives are simple, straight-forward and easy-to-implement. Extensive experiments on multiple cross-domain settings show the state-of-the-art performance of GOOD. Source code is available at <span><span>https://github.com/BiQiWHU/GOOD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 207-220"},"PeriodicalIF":10.6,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leandro Stival , Ricardo da Silva Torres , Helio Pedrini
{"title":"Semantically-Aware Contrastive Learning for multispectral remote sensing images","authors":"Leandro Stival , Ricardo da Silva Torres , Helio Pedrini","doi":"10.1016/j.isprsjprs.2025.02.024","DOIUrl":"10.1016/j.isprsjprs.2025.02.024","url":null,"abstract":"<div><div>Satellites continuously capture vast amounts of data daily, including multispectral remote sensing images (MSRSI), which facilitate the analysis of planetary processes and changes. New machine-learning techniques are employed to develop models to identify regions with significant changes, predict land-use conditions, and segment areas of interest. However, these methods often require large volumes of labeled data for effective training, limiting the utilization of captured data in practice. According to current literature, self-supervised learning (SSL) can be effectively applied to learn how to represent MSRSI. This work introduces Semantically-Aware Contrastive Learning (SACo+), a novel method for training a model using SSL for MSRSI. Relevant known band combinations are utilized to extract semantic information from the MSRSI and texture-based representations, serving as anchors for constructing a feature space. This approach is resilient against changes and yields semantically informative results using contrastive techniques based on sample visual properties, their categories, and their changes over time. This enables training the model using classic SSL contrastive frameworks, such as MoCo and its remote sensing version, SeCo, while also leveraging intrinsic semantic information. SACo+ generates features for each semantic group (band combination), highlighting regions in the images (such as vegetation, urban areas, and water bodies), and explores texture properties encoded based on Local Binary Pattern (LBP). To demonstrate the efficacy of our approach, we trained ResNet models with MSRSI using the semantic band combinations in SSL frameworks. Subsequently, we compared these models on three distinct tasks: land cover classification task using the EuroSAT dataset, change detection using the OSCD dataset, and semantic segmentation using the PASTIS and GID datasets. Our results demonstrate that leveraging semantic and texture features enhances the quality of the feature space, leading to improved performance in all benchmark tasks. The model implementation and weights are available at <span><span>https://github.com/lstival/SACo</span><svg><path></path></svg></span> — As of Jan. 2025.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 173-187"},"PeriodicalIF":10.6,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Map-Assisted remote-sensing image compression at extremely low bitrates","authors":"Yixuan Ye, Ce Wang, Wanjie Sun, Zhenzhong Chen","doi":"10.1016/j.isprsjprs.2025.03.005","DOIUrl":"10.1016/j.isprsjprs.2025.03.005","url":null,"abstract":"<div><div>Remote-sensing (RS) image compression at extremely low bitrates has always been a challenging task in practical scenarios like edge device storage and narrow bandwidth transmission. Generative models including VAEs and GANs have been explored to compress RS images into extremely low-bitrate streams. However, these generative models struggle to reconstruct visually plausible images due to the highly ill-posed nature of extremely low-bitrate image compression. To this end, we propose an image compression framework that utilizes a pre-trained diffusion model with powerful natural image priors to achieve high-realism reconstructions. However, diffusion models tend to hallucinate small structures and textures due to the significant information loss at limited bitrates. Thus, we introduce vector maps as semantic and structural guidance and propose a novel image compression approach named Map-Assisted Generative Compression (MAGC). MAGC employs a two-stage pipeline to compress and decompress RS images at extremely low bitrates. The first stage maps an image into a latent representation, which is then further compressed in a VAE architecture to save bitrates and serves as implicit guidance in the subsequent diffusion process. The second stage conducts a conditional diffusion model to generate a visually pleasing and semantically accurate result using implicit guidance and explicit semantic guidance. We also provide a one-step model called MAGC* to enhance the efficiency in image generation. Quantitative and qualitative comparisons show that our method outperforms standard codecs and other learning-based methods in terms of perceptual quality and semantic accuracy. The dataset and code will be publicly available at <span><span>https://github.com/WHUyyx/MAGC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 159-172"},"PeriodicalIF":10.6,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SFA-Net: A SAM-guided focused attention network for multimodal remote sensing image matching","authors":"Tian Gao, Chaozhen Lan, Wenjun Huang, Sheng Wang","doi":"10.1016/j.isprsjprs.2025.02.032","DOIUrl":"10.1016/j.isprsjprs.2025.02.032","url":null,"abstract":"<div><div>The robust and accurate matching of multimodal remote sensing images (MRSIs) is crucial for realizing the fusion of multisource remote sensing image information. Traditional matching methods fail to exhibit effective performance when confronted with significant nonlinear radiometric distortions (NRDs) and geometric differences in MRSIs. To address this critical issue, we propose a novel framework called the SAM-guided Focused Attention Network for MRSI matching (SFA-Net). Firstly, we utilize the Segment Anything Model to extract the edge structural features of MRSIs. In the meantime, convolutional neural networks are employed to extract the local deep features of MRSIs. The obtained edge structural features are then used as a prior information to guide the region self-attention network and the focused fusion cross-attention network. This improves the uniqueness of local depth features in a single image and enhances the cross-modal representation of local depth features across different images. Finally, metric learning and optimization algorithms are applied to improve the success rate of feature matching, further enhancing the accuracy and robustness of the matching results. Experimental results on 1050 MRSI pairs confirm that SFA-Net is able to achieve high-quality matching on large-scale challenging MRSI datasets, with good adaptation to severe NRDs and geometric differences. SFA-Net outperforms state-of-the-art algorithms qualitatively and quantitatively, including RIFT, ASS, CoFSM, WSSF, HOWP, CMM-Net, R2D2, ECOTR, and LightGlue. Our code<span><span><sup>1</sup></span></span> and dataset will be made publicly available upon publication of the paper.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 188-206"},"PeriodicalIF":10.6,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Zhu , Tianhai Cheng , Xingyu Li , Xiaotong Ye , Donghao Fan , Tao Tang , Haoran Tong , Lili Zhang
{"title":"Improving XCO2 retrieval under high aerosol loads with fused satellite aerosol Data: Advancing understanding of anthropogenic emissions","authors":"Hao Zhu , Tianhai Cheng , Xingyu Li , Xiaotong Ye , Donghao Fan , Tao Tang , Haoran Tong , Lili Zhang","doi":"10.1016/j.isprsjprs.2025.03.009","DOIUrl":"10.1016/j.isprsjprs.2025.03.009","url":null,"abstract":"<div><div>Satellite measurements of the column-averaged dry air mole fraction of carbon dioxide (XCO<sub>2</sub>) have been successfully employed to quantify anthropogenic carbon emissions under clean atmospheric conditions. However, for some large anthropogenic sources such as megacities or coal-fired power plants, which are often accompanied by high aerosol loads, especially in developing countries, atmospheric XCO<sub>2</sub> retrieval remains challenging. Traditional XCO<sub>2</sub> retrieval algorithms typically rely on model-based or single-satellite aerosol information as constraints, which offer limited accuracy under high aerosol conditions, resulting in imperfect aerosol scattering characterization. Various satellite sensors dedicated to aerosol detection provide distinct aerosol products, each with its strengths. The fusion of these products offers the potential for more accurate scattering characterization in high aerosol scenarios. Therefore, in this study, we first fused four satellite aerosol products from MODIS and VIIRS sensors using the Bayesian maximum entropy method and then incorporated it into the XCO<sub>2</sub> retrieval from NASA OCO-2 observations to improve retrieval quality under high aerosol conditions. Compared to the operational products, we find that XCO<sub>2</sub> retrievals coupled with co-located fused aerosol data exhibit improved accuracy and precision at higher aerosol loads, against the Total Carbon Column Observing Network (TCCON). Specifically, for high aerosol loadings (AOD@755 nm > 0.25), the mean bias and mean absolute error (MAE) of the XCO<sub>2</sub> retrieval are reduced by 0.14 ppm and 0.1 ppm, respectively, while the standard deviation of the XCO<sub>2</sub> error reaches 1.68 ppm. The detection capability of point source CO<sub>2</sub> emissions corresponding to this precision (1.68 ppm) is also evaluated in this study. Results show that the number of detectable coal-fired power plants globally under high aerosol conditions can be increased by 39 % compared to the application of operational products. These results indicate that using fused satellite aerosol products effectively improves XCO<sub>2</sub> retrieval under high aerosol conditions, advancing carbon emission understanding from important anthropogenic sources, particularly in developing countries.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 146-158"},"PeriodicalIF":10.6,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143629031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Potential of Sentinel-1 time-series data for monitoring the phenology of European temperate forests","authors":"Michael Schlund","doi":"10.1016/j.isprsjprs.2025.02.026","DOIUrl":"10.1016/j.isprsjprs.2025.02.026","url":null,"abstract":"<div><div>Time series from optical sensors are frequently used to retrieve phenology information of forests. While SAR (synthetic aperture radar) sensors can potentially provide even denser time series than optical data, their potential to retrieve phenological information of forests is still underexplored. In addition, the backscatter information from SAR is frequently exploited in the same way (e.g., via dynamic thresholding) as optical data to retrieve phenological information. Sentinel-1 backscatter coefficients of VH (vertical–horizontal) and VV (vertical–vertical) polarizations and their ratio were retrieved for temperate deciduous broad-leaf and evergreen needle-leaf forests in Europe. Breakpoints and dynamic thresholds were retrieved in the locally smoothed time-series data and compared to reference data from PhenoCam and fluxtower networks. It was generally found that breakpoints outperform dynamic thresholds in both forest types in terms of root mean squared differences, bias and <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>. Best results were achieved using breakpoints on the Sentinel-1 backscatter ratio with RMSEs of 18.4 days for the start of the season (SOS) and 14.0 days for the end of the season (EOS) compared to the 25% dynamic threshold of the seasonal amplitude in the reference data in deciduous broad-leaf forests. Substantially higher RMSE values of 56.7 days for SOS and 56.5 days for EOS were found in evergreen needle-leaf forests. This study suggests the potential of Sentinel-1 for the phenological retrieval of forests, in particular deciduous broad-leaf forests. This information could be used in combination with frequently used optical data to provide comprehensive phenological information on a large scale.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 131-145"},"PeriodicalIF":10.6,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143621001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Automated and Comprehensive Walkability Audits with Street View Images: Leveraging Virtual Reality for Enhanced Semantic Segmentation","authors":"Keundeok Park , Donghwan Ki , Sugie Lee","doi":"10.1016/j.isprsjprs.2025.02.015","DOIUrl":"10.1016/j.isprsjprs.2025.02.015","url":null,"abstract":"<div><div>Street view images (SVIs) coupled with computer vision (CV) techniques have become powerful tools in the planning and related fields for measuring the built environment. However, this methodology is often challenging to be implemented due to challenges in capturing a comprehensive set of planning-relevant environmental attributes and ensuring adequate accuracy. The shortcomings arise primarily from the annotation policies of the existing benchmark datasets used to train CV models, which are not specifically tailored to fit urban planning needs. For example, CV models trained on these existing datasets can only capture a very limited subset of the environmental features included in walkability audit tools. To address this gap, this study develops a virtual reality (VR) based benchmark dataset specifically tailored for measuring walkability with CV models. Our aim is to demonstrate that combining VR-based data with the real-world dataset (i.e., ADE20K) improves performance in automated walkability audits. Specifically, we investigate whether VR-based data enables CV models to audit a broader range of walkability-related objects (i.e., comprehensiveness) and to assess objects with enhanced accuracy (i.e., accuracy). In result, the integrated model achieves a pixel accuracy (PA) of 0.964 and an intersection-over-union (IoU) of 0.679, compared to a pixel accuracy of 0.959 and an IoU of 0.605 for the real-only model. Additionally, a model trained solely on virtual data, incorporating classes absent from the original dataset (i.e., bollards), attains a PA of 0.979 and an IoU of 0.676. These findings allow planners to adapt CV and SVI techniques for more planning-relevant purposes, such as accurately and comprehensively measuring walkability.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 78-90"},"PeriodicalIF":10.6,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143609580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}