Lei Lei , Xinyu Wang , Yanfei Zhong , Liangpei Zhang
{"title":"FineCrop: Mapping fine-grained crops using class-aware feature decoupling and parcel-aware class rebalancing with Sentinel-2 time series","authors":"Lei Lei , Xinyu Wang , Yanfei Zhong , Liangpei Zhang","doi":"10.1016/j.isprsjprs.2025.07.041","DOIUrl":"10.1016/j.isprsjprs.2025.07.041","url":null,"abstract":"<div><div>Fine-grained crop mapping refers to the precise differentiation of all crop types within an area, encompassing major classes (e.g., staple crops, to cash crops, to garden fruits, etc.) and their subclasses (e.g., wheat, barley, maize of staple crops). Fine-grained crop mapping is crucial for precise agriculture management. However, compared with staple crop mapping, fine-grained crop mapping faces more challenges: (1) the extremely similar phenological characteristics between crop subcategories, which could lead to the difficulty in extracting discriminative representation; (2) the imbalanced class distribution, which could lead to the bias of the model toward the head class, finally causing severe misclassification. In this paper, we proposed a novel framework for fine-grained crop mapping, termed FineCrop, by using class-aware feature decoupling (CFD) branch and parcel-aware class rebalancing (PCR) branch. Specifically, CFD was inspired by the “divide and conquer” theory and designed to learn the detailed and independent features of each crop type and to solve the phenological similarity. PCR was inspired by the data aggregation and designed to use a class-aware factor at parcel unit to solve the bias of classifier caused by the imbalanced data distribution. To evaluate FineCrop, we have built a fine-grained crops mapping dataset, termed FineCropSet by matching Sentinel-2 Level-2A product that has undergone radiometric and geometric correction with labels extracted from the EuroCrops. FineCropSet contains 138 crop types covering the North Rhine-Westphalia, the south of Slovakia, and the Netherlands of different years. The results showed that FineCrop can improve the overall accuracy of popular deep learning models for temporal satellite imagery by 5.83 %, 1.42 %, 0.89 % for three study areas respectively (p-value less than 0.05) verified by paired <em>t</em>-test, confirming the substantial improvement of FineCrop for fine-grained crop mapping. The ablation experiment results revealed that FineCrop could reduce the class imbalance by 5 and 18 times and extract the detailed features in parcel from edge to center. We believe the proposed method is promising for large-scale crop mapping particularly when dealing with crops with similar phenological characteristics and imbalanced distribution, leading to more accurate crop resources inventory. The source code and data are available: <span><span>https://github.com/LL0912/FineCrop</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 785-803"},"PeriodicalIF":12.2,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144830657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wenjie Sun , Yujie Lei , Danfeng Hong , Zhongwen Hu , Qingquan Li , Jie Zhang
{"title":"RSProtoSemiSeg: Semi-supervised semantic segmentation of high spatial resolution remote sensing images with probabilistic distribution prototypes","authors":"Wenjie Sun , Yujie Lei , Danfeng Hong , Zhongwen Hu , Qingquan Li , Jie Zhang","doi":"10.1016/j.isprsjprs.2025.07.040","DOIUrl":"10.1016/j.isprsjprs.2025.07.040","url":null,"abstract":"<div><div>Semi-supervised semantic segmentation of high spatial resolution remote sensing images aims to mitigate the reliance on labeled data by using limited labeled data alongside extensive unlabeled data. This approach significantly reduces the dependency on labeled images and annotation costs and mitigates the challenge of obtaining large-scale labeled datasets. Current semi-supervised methods for remote sensing images semantic segmentation mainly focus on improving pseudo-label quality. However, the inherent noise in pseudo-labels remains a critical issue, leading to persistent inaccuracies. Furthermore, the high intra-class variance in such image complicate pixel-wise label propagation, exacerbating pseudo-label errors and substantially constraining segmentation accuracy. To tackle these limitations, we propose a contrastive learning framework based on a mixture of Gaussian mixture distributions. Our approach uses a mixture probability distribution prototype predictor to adaptively regulate the influence of intra-class prototypes on pixel representations, mapping features to a multivariate Gaussian model to mitigate pseudo-label inaccuracies. We also introduce a novel mixture contrastive loss function to guide pseudo-labeled pixels towards intra-class prototypes and away from inter-class prototypes while repelling, thereby enhancing representation fidelity. Extensive experiments on three remote sensing semantic segmentation datasets demonstrate the efficacy of our approach. Compared to baseline models, our method achieves mIoU improvements ranging from 0.26% to 1.86%. Compared to the state-of-the-art DWL model, it achieves even higher mIoU improvements, ranging from 0.37% to 2.36%. These results confirm the effectiveness of our approach over existing semi-supervised segmentation methods. The code will be available at <span><span>https://github.com/aitointerp/rsprotosemiseg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 771-784"},"PeriodicalIF":12.2,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144826789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lijing Lu , Zhou Huang , Yi Bao , Lin Wan , Zhihang Li
{"title":"Multi-level Priors-Guided Diffusion-based Remote Sensing Image Super-Resolution","authors":"Lijing Lu , Zhou Huang , Yi Bao , Lin Wan , Zhihang Li","doi":"10.1016/j.isprsjprs.2025.07.020","DOIUrl":"10.1016/j.isprsjprs.2025.07.020","url":null,"abstract":"<div><div>Recently, diffusion models have achieved advancements in natural image super-resolution (SR) tasks, overcoming some issues posed by traditional approaches, e.g., performance limitations in CNN-based and Transformer-based approaches, as well as instable training and mode collapse in GAN. However, despite these advancements, existing diffusion-based SR methods fail to perform well for remote sensing images. Current diffusion-based super-resolution techniques face two key challenges: (1) A jeopardy to the generative prior arises due to the necessity of training from scratch, which can lead to suboptimal performance. (2) A loss of fidelity occurs due to the limited priors in SR models, which only take the low-resolution image as input. To deal with these challenges, we introduce a Multi-level Priors-Guided Diffusion-based Remote Sensing Image Super-Resolution Model (DLMSR) approach. In particular, we utilize a pre-trained stable diffusion model to maintain the generative prior captured in synthesis models, resulting in more stable and detailed outcomes. Furthermore, to establish comprehensive priors, we incorporate multimodal large language models (MLLMs) to capture diverse priors such as texture and content priors. Additionally, we introduce category priors by employing a category classifier to offer global and concise signals for precise reconstruction. Then, we devise a cascade prior fusion module and a class-aware encoder to integrate rich priors into the diffusion model. DLMSR is extensively evaluated on four publicly available remote sensing datasets, including AID, DOTA, DIOR, and NWPU-RESISC45, demonstrating consistent advantages over representative state-of-the-art methods. In particular, compared with StableSR, DLMSR achieves an average increase of 0.29 dB in PSNR and a decrease of 1.93 in FID across three simulated benchmarks, indicating enhanced reconstruction fidelity and perceptual quality. The source code and dataset links are publicly available at: <span><span>https://github.com/lijing28/DLMSR.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 756-770"},"PeriodicalIF":12.2,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144810215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GLD-Road: A global–local decoding road network extraction model for remote sensing images","authors":"Ligao Deng, Yupeng Deng, Yu Meng, Jingbo Chen, Zhihao Xi, Diyou Liu, Qifeng Chu","doi":"10.1016/j.isprsjprs.2025.07.026","DOIUrl":"https://doi.org/10.1016/j.isprsjprs.2025.07.026","url":null,"abstract":"Road networks are essential information for map updates, autonomous driving, and disaster response. However, manual annotation of road networks from remote sensing imagery is time-consuming and costly, whereas deep learning methods have gained attention for their efficiency and precision in road extraction. Current deep learning approaches for road network extraction fall into three main categories: postprocessing methods based on semantic segmentation results, global parallel methods and local iterative methods. Postprocessing methods introduce quantization errors, leading to higher overall road network inaccuracies; global parallel methods achieve high extraction efficiency but risk road node omissions; local iterative methods excel in node detection but have relatively lower extraction efficiency. To address the above limitations, We propose a two-stage road extraction model with global–local decoding, named GLD-Road, which possesses the high efficiency of global parallel methods and the strong node perception capability of local iterative methods, enabling a significant reduction in inference time while maintaining high-precision road network extraction. In the first stage, GLD-Road extracts the coordinates and direction descriptors of road nodes using global information from the entire input image. Subsequently, it connects adjacent nodes using a self-designed graph network module (Connect Module) to form the initial road network. In the second stage, based on the road endpoints contained in the initial road network, GLD-Road iteratively searches local images and the local grid map of the primary network to repair broken roads, ultimately producing a complete road network. Since the second stage only requires limited supplementary detection of locally missing nodes, GLD-Road significantly reduces the global iterative search range over the entire image, leading to a substantial reduction in retrieval time compared to local iterative methods. Finally, experimental results revealed that GLD-Road outperformed current state-of-the-art methods, achieving improvements of 1.9% and 0.67% in average path length similarity (APLS) on the City-Scale and SpaceNet3 datasets, respectively. Moreover, compared with those of a global parallel method (Sat2Graph) and a local iterative method (RNGDet++), the retrieval time of GLD-Road exhibited reductions of 40% and 92%, respectively, suggesting that GLD-Road achieves a pronounced improvement in road network extraction efficiency compared to existing methods. The experimental results are available at <ce:inter-ref xlink:href=\"https://github.com/ucas-dlg/GLD-Road\" xlink:type=\"simple\">https://github.com/ucas-dlg/GLD-Road</ce:inter-ref>.","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"31 1","pages":""},"PeriodicalIF":12.7,"publicationDate":"2025-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unmixing frequency features for DEM super resolution","authors":"Zhuwei Wen, He Chen, Xianwei Zheng","doi":"10.1016/j.isprsjprs.2025.07.039","DOIUrl":"https://doi.org/10.1016/j.isprsjprs.2025.07.039","url":null,"abstract":"DEM super-resolution (SR) has recently been advanced by deep learning. The focus of existing works is mainly on the employment of various terrain constraints to force the general deep SR models to adapt to DEM data. However, we found that they leave a fundamental issue of terrain pattern confusion caused by the mixed frequency feature learning of deep neural networks, which leads to an inherent trade-off between the reconstruction of fundamental structures and the preservation of fine-grained terrain details. In this study, we propose a novel dual-frequency feature learning network (DuffNet) for high quality DEM super-resolution. The core idea of DuffNet is to directly learn the mapping relationship between low-resolution (LR) and high-resolution (HR) DEMs with meaningful frequency features, rather than the mixed convolutional features extracted from raw DEMs. Specifically, DuffNet deploys a dual-branch structure with a dedicatedly designed dual-frequency loss to enable the learning of high- and low-frequency features under the supervision of input HR DEM. An adaptive elevation amplitude refiner (AEAR) is then developed to dynamically adjust and optimize the amplitudes of the initial HR DEM synthesized by the integration of learned low-frequency and high-frequency terrain components. Extensive experiments conducted on TFASR30, Pyrenees, Tyrol, and the challenging TFASR30to10 datasets show that DuffNet can achieve state-of-the-art performance, outperforming other SoTA methods such as TTSR and CDEM by 19% and 29% respectively in RMSE-Elevation on the TFASR30to10 dataset. The dataset and source code are available at: <ce:inter-ref xlink:href=\"https://github.com/Geo-Tell/DuffNet\" xlink:type=\"simple\">https://github.com/Geo-Tell/DuffNet</ce:inter-ref>.","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"42 1","pages":""},"PeriodicalIF":12.7,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lassi Ruoppa, Oona Oinonen, Josef Taher, Matti Lehtomäki, Narges Takhtkeshha, Antero Kukko, Harri Kaartinen, Juha Hyyppä
{"title":"Unsupervised deep learning for semantic segmentation of multispectral LiDAR forest point clouds","authors":"Lassi Ruoppa, Oona Oinonen, Josef Taher, Matti Lehtomäki, Narges Takhtkeshha, Antero Kukko, Harri Kaartinen, Juha Hyyppä","doi":"10.1016/j.isprsjprs.2025.07.038","DOIUrl":"https://doi.org/10.1016/j.isprsjprs.2025.07.038","url":null,"abstract":"Point clouds captured with laser scanning systems from forest environments can be utilized in a wide variety of applications within forestry and plant ecology, such as the estimation of tree stem attributes, leaf angle distribution, and above-ground biomass. However, effectively utilizing the data in such tasks requires the semantic segmentation of the data into wood and foliage points, also known as leaf–wood separation. The traditional approach to leaf–wood separation has been geometry- and radiometry-based unsupervised algorithms, which tend to perform poorly on data captured with airborne laser scanning (ALS) systems, even with a high point density (<mml:math altimg=\"si1.svg\" display=\"inline\"><mml:mrow><mml:mo>></mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>000</mml:mn></mml:mrow></mml:math> points/m<mml:math altimg=\"si218.svg\" display=\"inline\"><mml:msup><mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math>). While recent machine and deep learning approaches achieve great results even on sparse point clouds, they require manually labeled training data, which is often extremely laborious to produce. Multispectral (MS) information has been demonstrated to have potential for improving the accuracy of leaf–wood separation, but quantitative assessment of its effects has been lacking. This study proposes a fully unsupervised deep learning method, GrowSP-ForMS, which is specifically designed for leaf–wood separation of high-density MS ALS point clouds (acquired with wavelengths 532, 905, and 1550 nm) and based on the GrowSP architecture. GrowSP-ForMS achieved a mean accuracy of 84.3% and a mean intersection over union (mIoU) of 69.6% on our MS test set, outperforming the unsupervised reference methods by a significant margin. When compared to supervised deep learning methods, our model performed similarly to the slightly older PointNet architecture but was outclassed by more recent approaches. Finally, two ablation studies were conducted, which demonstrated that our proposed changes increased the test set mIoU of GrowSP-ForMS by 29.4 percentage points (pp) in comparison to the original GrowSP model, and that utilizing MS data improved the mIoU by 5.6 pp from the monospectral case. For reproducibility, we release the GrowSP-ForMS source code and pretrained weights (<ce:inter-ref xlink:href=\"https://github.com/ruoppa/GrowSP-ForMS\" xlink:type=\"simple\">https://github.com/ruoppa/GrowSP-ForMS</ce:inter-ref>), along with the multispectral data set (<ce:inter-ref xlink:href=\"https://zenodo.org/records/15913427\" xlink:type=\"simple\">https://zenodo.org/records/15913427</ce:inter-ref>).","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"189 1","pages":""},"PeriodicalIF":12.7,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ji Zhao, Jiayu Tong, Tianhong Li, Yao Sun, Changliang Shao, Yuting Dong
{"title":"CISNet: Change information guided semantic segmentation network for automatic extraction of glacier calving fronts","authors":"Ji Zhao, Jiayu Tong, Tianhong Li, Yao Sun, Changliang Shao, Yuting Dong","doi":"10.1016/j.isprsjprs.2025.08.001","DOIUrl":"https://doi.org/10.1016/j.isprsjprs.2025.08.001","url":null,"abstract":"The movement of the glacier calving front indicates changes in the mass balance of the glacier and is crucial for analyzing trends in global sea level changes. The launch of a large number of remote-sensing satellites has led to the generation of massive number of images that have enabled the application of deep-learning-based methods. However, existing methods generally focus solely on individual images and do not explore the relationships between glacier images. Therefore, this study proposes a change information-guided semantic segmentation network (CISNet) to explore category semantic relationships in glacier images by linking semantic segmentation with change information extraction tasks. In CISNet, we established a dual-branch architecture consisting of semantic segmentation and change information extraction using a weight-shared feature extraction module. U-ConvNextV2 was developed to extract multi-scale features of different classes in glacier images by integrating a high-performance feature-extraction module with the UNet effective framework. Its multi-scale feature fusion architecture based on skip connections ensures accurate segmentation of glacier semantics. To explore the relationships between different images, a pairwise change information extraction branch was used to extract consistent and inconsistent relationships from any image pair. The global random matching strategy for constructing image pairs enhanced the ability of the network to extract the features of glaciers and oceans. To improve the integration of the semantic features and change information during the training phase, an adaptive joint loss was proposed to dynamically adjust the optimization process of the two branches. Extensive experiments were conducted using the latest publicly available large-scale CaFFe dataset to validate this method, and CISNet outperformed the state-of-the-art deep-learning methods with a mean distance error (MDE) of 398 ± 43 m. To further validate the ability of CISNet to generalize across glaciers and regions, we selected data from a glacier area as the training dataset and the rest as the test set to construct a challenging CaFFe-SI dataset. In the CaFFe-SI experiment, CISNet achieved the best MDE of 888 ± 21 m and demonstrated a comprehensive superiority across the other evaluation metrics.","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"10 1","pages":""},"PeriodicalIF":12.7,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic in-situ radiometric calibration of TLS: Compensating distance and angle of incidence effects using overlapping scans","authors":"H. Laasch, T. Medic, N. Pfeifer, A. Wieser","doi":"10.1016/j.isprsjprs.2025.07.012","DOIUrl":"https://doi.org/10.1016/j.isprsjprs.2025.07.012","url":null,"abstract":"Terrestrial laser scanners (TLS) commonly record intensity of the backscattered signal as an auxiliary measurement, which can be related to material properties and used in various applications, such as point cloud segmentation. However, retrieving the material-related information from the TLS intensities is not trivial, as this information is overlayed by other systematic influences affecting the backscattered signal. One of the major factors that needs to be accounted for is the measurement configuration, which is defined by the instrument-to-target distance and angle of incidence (AOI). By obtaining measurement-configuration independent intensity (<mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant=\"normal\">MCI</mml:mi></mml:mrow></mml:msub></mml:math>) material probing, classification, segmentation, and similar tasks can be enhanced. Current methods for obtaining such corrected intensities require additional dedicated measurement set-ups (often in a lab and with specialized targets) and manual work to estimate the effects of distance and AOI on the recorded values. Moreover, they are optimized only for specific datasets comprising a small number of targets with different material properties. This paper presents an automated method for in-situ estimation of <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant=\"normal\">MCI</mml:mi></mml:mrow></mml:msub></mml:math>, eliminating the need for additional dedicated measurements or manual work. Instead, the proposed method uses overlapping point clouds from different scan stations of an arbitrary scene that are anyway collected during a scanning project. We demonstrate the generalizability of the proposed method across different scenes and instruments, show how the retrieved <mml:math altimg=\"si1.svg\" display=\"inline\"><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant=\"normal\">MCI</mml:mi></mml:mrow></mml:msub></mml:math> values can improve segmentation, and how they increase the comparability of the intensities between different instruments.","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"97 1","pages":"648-665"},"PeriodicalIF":12.7,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144897862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Mei, Kunqian Li, Shuaixin Liu, Chengzhi Ma, Qianli Jiang
{"title":"DPF-Net: Physical imaging model embedded data-driven underwater image enhancement","authors":"Han Mei, Kunqian Li, Shuaixin Liu, Chengzhi Ma, Qianli Jiang","doi":"10.1016/j.isprsjprs.2025.07.031","DOIUrl":"https://doi.org/10.1016/j.isprsjprs.2025.07.031","url":null,"abstract":"Due to the complex interplay of light absorption and scattering in the underwater environment, underwater images experience significant degradation. This research presents a two-stage underwater image enhancement network called the Data-Driven and Physical Parameters Fusion Network (DPF-Net), which harnesses the robustness of physical imaging models alongside the generality and efficiency of data-driven methods. We train the Degraded Parameters Estimation Module (DPEM) on synthetic datasets with preset physical parameters as ground truth. This approach learns more authentic underwater imaging model, contrasting with prior works that directly fit raw-to-reference image mappings through the imaging equation. This module is subsequently trained in conjunction with an enhancement network, where the estimated physical parameters are integrated into a data-driven model within the embedding space. During model training, in addition to traditional reference-based losses, we use a degradation consistency loss to ensure physical consistency. Furthermore, we propose a new weak reference loss term that leverages the color distribution of the entire training set, thereby alleviating the reliance of our model on the quality of individual reference images. Our proposed DPF-Net demonstrates superior performance compared to other benchmark methods across multiple test sets, achieving state-of-the-art results. The source code and pre-trained models are available on the project home page: <ce:inter-ref xlink:href=\"https://github.com/OUCVisionGroup/DPF-Net\" xlink:type=\"simple\">https://github.com/OUCVisionGroup/DPF-Net</ce:inter-ref>.","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"17 1","pages":""},"PeriodicalIF":12.7,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144898134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incorporating prior knowledge and temporal memory transformer network for satellite video object tracking","authors":"Jiawei Zhou , Yanni Dong , Yuxiang Zhang , Bo Du","doi":"10.1016/j.isprsjprs.2025.07.032","DOIUrl":"10.1016/j.isprsjprs.2025.07.032","url":null,"abstract":"<div><div>Satellite video object tracking (SVOT) faces more challenges compared to general video tracking, such as sparse target features, cluttered backgrounds, and frequent occlusion. Although numerous researchers have proposed solutions to address these challenges, SVOT still encounters three major issues. (1) Insufficient mining of temporal information: Most methods only utilize motion cues and dynamic templates as sources of temporal information. (2) Lack of robust solutions for frequent occlusion: Existing methods typically rely on threshold hyperparameters and employ Kalman filtering as the motion model, making it challenging to handle complex and long-term occlusion scenarios. (3) Underutilization of prior knowledge: Current methods typically employ cosine windows to suppress excessive displacement, but they neglect the kinematic patterns of targets in satellite videos. In order to address the above issues, we propose a method that incorporates prior knowledge and memory transformer network, namely MemTrack. The proposed memory module adaptively extracts and stores the relevant discriminative features of the target during the tracking phase, thereby further mining target-related temporal information and enhancing the model’s perception of the target. Based on prior knowledge and motion cues, we introduce an adaptive judgment strategy that identifies occlusion scenarios according to target size without relying on threshold hyperparameters, and we employ a linear regression approach as the motion model, which is both simple and effective in mitigating frequent occlusion issues. Additionally, we develop a biased 2D Gaussian window that indicates the target’s motion trend, thereby boosting tracker performance. MemTrack experiment in four large satellite video datasets, namely SatSOT, SV248S, OOTB and VISO respectively, achieving the best performance compared to the state-of-the-art (SOTA) trackers. On the SatSOT dataset, our tracker achieves an AUC score of 57.0%, marking the first time, to the best of our knowledge, that an AUC value has surpassed 55 without satellite video training on this dataset. The results demonstrate effectiveness and superiority of proposed method in SVOT. The project is available in <span><span>https://github.com/jiawei-zhou/MemTrack.git</span><svg><path></path></svg></span>, boosting progress of the SVOT.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"228 ","pages":"Pages 630-647"},"PeriodicalIF":12.2,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144779688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}