Guohua Gou, Xuanhao Wang, Yang Ye, Han Li, Hao Zhang, Weicheng Jiang, Mingting Zhou, Haigang Sui
{"title":"Hybrid depth-event pose estimation for online dense reconstruction in challenging conditions","authors":"Guohua Gou, Xuanhao Wang, Yang Ye, Han Li, Hao Zhang, Weicheng Jiang, Mingting Zhou, Haigang Sui","doi":"10.1016/j.isprsjprs.2025.03.013","DOIUrl":"10.1016/j.isprsjprs.2025.03.013","url":null,"abstract":"<div><div>In this paper, we present a novel dense SLAM system based on depth-event fusion, aiming to address the challenge of online dense reconstruction in challenging environments. To achieve robust camera tracking, we devise a hybrid depth-event pose estimation framework based on random optimization, which estimates all states jointly. Notably, we introduce an innovative 3D-2D edge alignment method based on particle swarm optimization, specifically tailored for event cameras, to tackle the highly non-linear pose estimation problem. Furthermore, we implement a dynamic update mechanism for both geometric and intensity edges of the 3D reconstruction, enabling efficient and accurate management of edge information. Our method represents the first depth-event dense SLAM system employing a random optimization paradigm, achieving robust performance even with high-speed camera motion, specifically linear velocities exceeding 1 m/s and/or angular velocities exceeding 2 rad/s. The system achieves accurate and globally consistent dense mapping with a maximum spatial resolution of 2 mm, while maintaining real-time performance at approximately 30 FPS for simultaneous localization and 3D reconstruction. Through extensive evaluations on synthetic and real-world datasets, particularly on our newly constructed DEveSet dataset, we demonstrate the superior performance of our proposed method compared to state-of-the-art techniques such as InfiniTAM, ROSEFusion, and DEVO. Contact us for access to the DEveSet download link.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 328-343"},"PeriodicalIF":10.6,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Jing , Haichen Bai , Binbin Song , Weiping Ni , Junzheng Wu , Qi Wang
{"title":"HeteCD: Feature Consistency Alignment and difference mining for heterogeneous remote sensing image change detection","authors":"Wei Jing , Haichen Bai , Binbin Song , Weiping Ni , Junzheng Wu , Qi Wang","doi":"10.1016/j.isprsjprs.2025.03.008","DOIUrl":"10.1016/j.isprsjprs.2025.03.008","url":null,"abstract":"<div><div>Optical change detection is limited by imaging conditions, hindering real-time applications. Synthetic Aperture Radar (SAR) overcomes these limitations by penetrating clouds and being unaffected by lighting, enabling all-weather monitoring when combined with optical data. However, existing heterogeneous change detection datasets lack complexity, focusing on single-scene targets. To address this gap, we introduce the XiongAn dataset, a novel urban architectural change dataset designed to advance heterogeneous change detection research. Furthermore, we propose HeteCD, a fully supervised heterogeneous change detection framework. HeteCD employs a Siamese Transformer architecture with non-shared weights to effectively model heterogeneous feature spaces and includes a Feature Consistency Alignment (FCA) loss to harmonize distributions and ensure class consistency across bi-temporal images. Additionally, a 3D Spatio-temporal Attention Difference module is incorporated to extract highly discriminative difference information from bi-temporal features. Extensive experiments on the XiongAn dataset demonstrate that HeteCD achieves a superior IoU of 67.50%, outperforming previous state-of-the-art methods by 1.31%. The code will be available at <span><span>https://github.com/weiAI1996/HeteCD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 317-327"},"PeriodicalIF":10.6,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yujia Chen , Guo Zhang , Hao Cui , Xue Li , Shasha Hou , Chunyang Zhu , Zhigang Xie , Deren Li
{"title":"Superpixel-aware credible dual-expert learning for land cover mapping using historical land cover product","authors":"Yujia Chen , Guo Zhang , Hao Cui , Xue Li , Shasha Hou , Chunyang Zhu , Zhigang Xie , Deren Li","doi":"10.1016/j.isprsjprs.2025.02.014","DOIUrl":"10.1016/j.isprsjprs.2025.02.014","url":null,"abstract":"<div><div>One of the key solutions to the challenge of collecting training labels for high-resolution remote sensing images is to leverage prior information from historical land cover products, which includes knowledge derived from both same- and low-resolution land cover products (relative to the targeted images). However, employing these products as training labels directly fails to yield encouraging results in the pixel-level training process due to the widespread existence of complex noise labels. These noise labels can be categorized into scale-response noise labels, resulting from resolution discrepancies, and model-cognitive noise labels, caused by misclassifications from historical classification models or temporal changes. To address these noise labels, we propose employing superpixels as training units to mitigate scale-response and small-scale model-cognitive noise labels. The large-scale model-cognitive noise labels might then be adaptively optimized during the training process by integrating multi-source knowledge. Accordingly, we design a superpixel-aware credible dual-expert weakly supervised learning (SCDWSL) approach for high-resolution land cover mapping. Our method utilizes the multi-scale contextual information perception capabilities of superpixels and integrates credible assessment from dual-expert knowledge framework to hierarchically tackle various noise labels. To validate the effectiveness of SCDWSL, we conduct experiments using the WorldCover with a resolution of 10-m as labels. First, we evaluate its capacity to handle both scale-response and model-cognitive noise using the National Agricultural Imagery Program dataset and GaoFen-2 image (1-m resolution). Secondly, we assess its performance on addressing model-cognitive noise alone using Sentinel-2 data. Extensive experiments demonstrate that SCDWSL outperforms existing weakly supervised methods across three datasets, highlighting its unique advantages and applicability on large-scale land cover mapping.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 296-316"},"PeriodicalIF":10.6,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jing Du , Linlin Xu , Lingfei Ma , Kyle Gao , John Zelek , Jonathan Li
{"title":"3D semantic segmentation: Cluster-based sampling and proximity hashing for novel class discovery","authors":"Jing Du , Linlin Xu , Lingfei Ma , Kyle Gao , John Zelek , Jonathan Li","doi":"10.1016/j.isprsjprs.2025.03.001","DOIUrl":"10.1016/j.isprsjprs.2025.03.001","url":null,"abstract":"<div><div>Novel Class Discovery (NCD) in 3D semantic segmentation is crucial for applications requiring the ability to learn and segment previously unknown classes in point cloud data, such as autonomous driving and urban planning. Traditional 3D semantic segmentation methods often build upon a fixed set of known classes, which restricts their ability to discover classes not covered in the original training data. To overcome these limitations, we propose a novel framework specifically designed for NCD in 3D semantic segmentation. The framework integrates the Voxel-Geometry Data Integration module, the Cluster-based Representative Sampling module, the Neighborhood Spatial Partitioning module, and the Spatial Feature Attention Mechanism. These modules collectively enhance the model’s capability to integrate spatial and geometric information, identify key representative points, map neighborhoods effectively, and synthesize localized and global features. Experimental results on benchmark datasets, including S3DIS, Toronto-3D, SemanticSTF, and SemanticPOSS, demonstrate the proposed method’s superior performance in discovering novel classes and improving overall segmentation quality. For instance, in the SemanticPOSS-<span><math><msup><mrow><mn>4</mn></mrow><mrow><mn>0</mn></mrow></msup></math></span> split, the method achieves a mean Intersection over Union (mIoU) of 43.68% for novel classes, compared to 35.70% achieved by NOPS. These results highlight the framework’s effectiveness in handling complex scenes and its potential to advance NCD in 3D semantic segmentation.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 274-295"},"PeriodicalIF":10.6,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143675568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nan Zhang , Borui Chai , Jiamin Song , Tian Tian , Pengfei Zhu , Jiayi Ma , Jinwen Tian
{"title":"Omni-Scene Infrared Vehicle Detection: An Efficient Selective Aggregation approach and a unified benchmark","authors":"Nan Zhang , Borui Chai , Jiamin Song , Tian Tian , Pengfei Zhu , Jiayi Ma , Jinwen Tian","doi":"10.1016/j.isprsjprs.2025.03.002","DOIUrl":"10.1016/j.isprsjprs.2025.03.002","url":null,"abstract":"<div><div>Vehicle detection in infrared aerial imagery is essential for military and civilian applications due to its effectiveness in low-light and adverse scenarios. However, the low spectral and pixel resolution of long-wave infrared (LWIR) results in limited information compared to visible light, causing significant background interference. Moreover, varying thermal radiation from vehicle movement and environmental factors creates diverse vehicle patterns, complicating accurate detection and recognition. To address these challenges, we propose the Omni-Scene Infrared Vehicle Detection Network (OSIV-Net), a framework optimized for scene adaptability in infrared vehicle detection. The core architecture of OSIV-Net employs Efficient Selective Aggregation Blocks (ESABlocks), combining Anchor-Adaptive Convolution (A<sup>2</sup>Conv) in shallow layers and the Magic Cube Module (MCM) in deeper layers to accurately capture and selectively aggregate features. A<sup>2</sup>Conv captures the local intrinsic and variable patterns of vehicles by combining differential and dynamic convolutions, while MCM flexibly integrates global features from three dimensions of the feature map. In addition, we constructed the Omni-Scene Infrared Vehicle (OSIV) dataset, the most comprehensive infrared aerial vehicle dataset to date, with 39,583 images spanning nine distinct scenes and over 617,000 annotated vehicle instances across five categories, providing a robust benchmark for advancing infrared vehicle detection across varied environments. Experimental results on the DroneVehicle and OSIV datasets demonstrate that OSIV-Net achieves state-of-the-art (SOTA) performance and outperforms across various scenarios. Specifically, it attains 82.60% [email protected] on the DroneVehicle dataset, surpassing the previous infrared modality SOTA method DTNet by +4.27% and the multi-modal SOTA method MGMF by +2.3%. On the OSIV dataset, it attains an average performance of 78.14% across all scenarios, outperforming DTNet by +6.13%. The dataset and code can be downloaded from <span><span>https://github.com/rslab1111/OSIV</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 244-260"},"PeriodicalIF":10.6,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"InSAR estimates of excess ground ice concentrations near the permafrost table","authors":"S. Zwieback , G. Iwahana , Q. Chang , F. Meyer","doi":"10.1016/j.isprsjprs.2025.03.004","DOIUrl":"10.1016/j.isprsjprs.2025.03.004","url":null,"abstract":"<div><div>Ground ice melt can reshape permafrost environments, with repercussions for Northern livelihoods and infrastructure. However, fine-scale permafrost ground ice products are lacking, limiting environmental change predictions. We propose an InSAR-based approach for estimating ground ice near the permafrost table in sparsely vegetated terrain underlain by continuous permafrost. The Bayesian inversion retrieves ice content by matching the subsidence predicted by a forward model to InSAR observations, accounting for atmospheric, decorrelation, and model parameter uncertainty. We specifically estimate the excess ice concentration of materials that thaw at the end of summer; in summers with deep thaw, these materials overlap with the previous years’ upper permafrost. In a very warm summer in Northwestern Alaska, Sentinel-1 retrievals showed average excess ice concentrations of, respectively, 0.4 and 0.0 in locations independently determined to be ice-rich and ice-poor. In ice-rich locations, the estimates were lower in the preceding warm summer, indicating the thaw front rarely penetrated deep into the ice-rich intermediate layer. Performance was sensitive to the density of stable reference points for atmospheric correction, with deviations of up to 0.3 and increased uncertainty when fewer reference points were used. Toward filling gaps and mitigating InSAR retrieval errors far from reference points, we determined the predictability of the InSAR ice concentrations from topographic and optical surface proxies, finding a moderate <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> of 0.6, with slope being the most important predictor. In summary, the InSAR inversion provides quantitative ice concentration estimates near the permafrost table independent of surface manifestations of ground ice, in-situ observations and geological information. Its combination with optical remote sensing and geological information has the potential to provide seamless, fine-scale permafrost ground ice products.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 261-273"},"PeriodicalIF":10.6,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhendong Liu , Liang Zhai , Jie Yin , Xiaoli Liu , Shilong Zhang , Dongyang Wang , Abbas Rajabifard , Yiqun Chen
{"title":"TRSP: Texture reconstruction algorithm driven by prior knowledge of ground object types","authors":"Zhendong Liu , Liang Zhai , Jie Yin , Xiaoli Liu , Shilong Zhang , Dongyang Wang , Abbas Rajabifard , Yiqun Chen","doi":"10.1016/j.isprsjprs.2025.03.015","DOIUrl":"10.1016/j.isprsjprs.2025.03.015","url":null,"abstract":"<div><div>The texture reconstruction algorithm uses multiview images and 3D geometric surface models as data sources to establish the mapping relationship and texture consistency constraints between 2D images and 3D geometric surfaces to produce a 3D surface model with color reality. The existing algorithms still have challenges in terms of texture quality when faced with dynamic scenes with complex outdoor features and different lighting environments. In this paper, a texture reconstruction algorithm driven by prior knowledge of ground object types is proposed. First, a multiscale and multifactor joint screening strategy is constructed to generate sparse key scenes of occlusion perception. Second, globally consistent 3D semantic mapping rules and semantic similarity measures are proposed. The multiview 2D image semantic segmentation results are refined, fused, and mapped into 3D semantic category information. Then, the 3D model semantic information is introduced to construct the energy function of the prior knowledge of the ground objects, and the color of the texture block boundary is adjusted. Experimental verification and analysis are conducted using public and actual datasets. Compared with famous algorithms such as Allene, Waechter, and OpenMVS, the core indicators of texture quality of the proposed algorithm are effectively reduced by 57.14 %, 53.24 % and 50.69 %, and it performs best in terms of clarity and contrast of texture details; the effective culling rate of moving objects is about 80 %–88.9 %, the texture mapping is cleaner and the redundant calculation is significantly reduced.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 221-243"},"PeriodicalIF":10.6,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Bi , Beichen Zhou , Jingjun Yi , Wei Ji , Haolan Zhan , Gui-Song Xia
{"title":"GOOD: Towards domain generalized oriented object detection","authors":"Qi Bi , Beichen Zhou , Jingjun Yi , Wei Ji , Haolan Zhan , Gui-Song Xia","doi":"10.1016/j.isprsjprs.2025.02.025","DOIUrl":"10.1016/j.isprsjprs.2025.02.025","url":null,"abstract":"<div><div>Oriented object detection has been rapidly developed in the past few years, but most of these methods assume the training and testing images are under the same statistical distribution, which is far from reality. In this paper, we propose the task of domain generalized oriented object detection, which intends to explore the generalization of oriented object detectors on arbitrary unseen target domains. Learning domain generalized oriented object detectors is particularly challenging, as the cross-domain style variation not only negatively impacts the content representation, but also leads to unreliable orientation predictions. To address these challenges, we propose a generalized oriented object detector (GOOD). After style hallucination by the emerging contrastive language-image pre-training (CLIP), it consists of two key components, namely, rotation-aware content consistency learning (RAC) and style consistency learning (SEC). The proposed RAC allows the oriented object detector to learn stable orientation representation from style-diversified samples. The proposed SEC further stabilizes the generalization ability of content representation from different image styles. Notably, both learning objectives are simple, straight-forward and easy-to-implement. Extensive experiments on multiple cross-domain settings show the state-of-the-art performance of GOOD. Source code is available at <span><span>https://github.com/BiQiWHU/GOOD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 207-220"},"PeriodicalIF":10.6,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leandro Stival , Ricardo da Silva Torres , Helio Pedrini
{"title":"Semantically-Aware Contrastive Learning for multispectral remote sensing images","authors":"Leandro Stival , Ricardo da Silva Torres , Helio Pedrini","doi":"10.1016/j.isprsjprs.2025.02.024","DOIUrl":"10.1016/j.isprsjprs.2025.02.024","url":null,"abstract":"<div><div>Satellites continuously capture vast amounts of data daily, including multispectral remote sensing images (MSRSI), which facilitate the analysis of planetary processes and changes. New machine-learning techniques are employed to develop models to identify regions with significant changes, predict land-use conditions, and segment areas of interest. However, these methods often require large volumes of labeled data for effective training, limiting the utilization of captured data in practice. According to current literature, self-supervised learning (SSL) can be effectively applied to learn how to represent MSRSI. This work introduces Semantically-Aware Contrastive Learning (SACo+), a novel method for training a model using SSL for MSRSI. Relevant known band combinations are utilized to extract semantic information from the MSRSI and texture-based representations, serving as anchors for constructing a feature space. This approach is resilient against changes and yields semantically informative results using contrastive techniques based on sample visual properties, their categories, and their changes over time. This enables training the model using classic SSL contrastive frameworks, such as MoCo and its remote sensing version, SeCo, while also leveraging intrinsic semantic information. SACo+ generates features for each semantic group (band combination), highlighting regions in the images (such as vegetation, urban areas, and water bodies), and explores texture properties encoded based on Local Binary Pattern (LBP). To demonstrate the efficacy of our approach, we trained ResNet models with MSRSI using the semantic band combinations in SSL frameworks. Subsequently, we compared these models on three distinct tasks: land cover classification task using the EuroSAT dataset, change detection using the OSCD dataset, and semantic segmentation using the PASTIS and GID datasets. Our results demonstrate that leveraging semantic and texture features enhances the quality of the feature space, leading to improved performance in all benchmark tasks. The model implementation and weights are available at <span><span>https://github.com/lstival/SACo</span><svg><path></path></svg></span> — As of Jan. 2025.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 173-187"},"PeriodicalIF":10.6,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Map-Assisted remote-sensing image compression at extremely low bitrates","authors":"Yixuan Ye, Ce Wang, Wanjie Sun, Zhenzhong Chen","doi":"10.1016/j.isprsjprs.2025.03.005","DOIUrl":"10.1016/j.isprsjprs.2025.03.005","url":null,"abstract":"<div><div>Remote-sensing (RS) image compression at extremely low bitrates has always been a challenging task in practical scenarios like edge device storage and narrow bandwidth transmission. Generative models including VAEs and GANs have been explored to compress RS images into extremely low-bitrate streams. However, these generative models struggle to reconstruct visually plausible images due to the highly ill-posed nature of extremely low-bitrate image compression. To this end, we propose an image compression framework that utilizes a pre-trained diffusion model with powerful natural image priors to achieve high-realism reconstructions. However, diffusion models tend to hallucinate small structures and textures due to the significant information loss at limited bitrates. Thus, we introduce vector maps as semantic and structural guidance and propose a novel image compression approach named Map-Assisted Generative Compression (MAGC). MAGC employs a two-stage pipeline to compress and decompress RS images at extremely low bitrates. The first stage maps an image into a latent representation, which is then further compressed in a VAE architecture to save bitrates and serves as implicit guidance in the subsequent diffusion process. The second stage conducts a conditional diffusion model to generate a visually pleasing and semantically accurate result using implicit guidance and explicit semantic guidance. We also provide a one-step model called MAGC* to enhance the efficiency in image generation. Quantitative and qualitative comparisons show that our method outperforms standard codecs and other learning-based methods in terms of perceptual quality and semantic accuracy. The dataset and code will be publicly available at <span><span>https://github.com/WHUyyx/MAGC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"223 ","pages":"Pages 159-172"},"PeriodicalIF":10.6,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143644234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}