Tao Zhou , Guoqing Zhang , Jida Wang , Zhe Zhu , R.Iestyn Woolway , Xiaoran Han , Fenglin Xu , Jun Peng
{"title":"A novel framework for accurate, automated and dynamic global lake mapping based on optical imagery","authors":"Tao Zhou , Guoqing Zhang , Jida Wang , Zhe Zhu , R.Iestyn Woolway , Xiaoran Han , Fenglin Xu , Jun Peng","doi":"10.1016/j.isprsjprs.2025.02.008","DOIUrl":"10.1016/j.isprsjprs.2025.02.008","url":null,"abstract":"<div><div>Accurate, consistent, and long-term monitoring of global lake dynamics is essential for understanding the impacts of climate change and human activities on water resources and ecosystems. However, existing methods often require extensive manually collected training data and expert knowledge to delineate accurate water extents of various lake types under different environmental conditions, limiting their applicability in data-poor regions and scenarios requiring rapid mapping responses (e.g., lake outburst floods) and frequent monitoring (e.g., highly dynamic reservoir operations). This study presents a novel remote sensing framework for automated global lake mapping using optical imagery, combining single-date and time-series algorithms to address these challenges. The single-date algorithm leverages a multi-objects superposition approach to automatically generate high-quality training sample, enabling robust machine learning-based lake boundary delineation with minimal manual intervention. This innovative approach overcomes the challenge of obtaining representative training sample across diverse environmental contexts and flexibly adapts to the images to be classified. Building upon this, the time-series algorithm incorporates dynamic mapping area adjustment, robust cloud and snow filtering, and time-series analysis, maximizing available clear imagery (>80 %) and optimizing the temporal frequency and spatial accuracy of the produced lake area time series. The framework’s effectiveness is validated by Landsat imagery using globally representative and locally focused test datasets. The automatically generated training sample achieves commission and omission rates of ∼1 % compared to manually collected sample. The resulting single-date lake mapping demonstrates overall accuracy exceeding 96 % and a Mean Percentage Error of <4 % relative to manually delineated lake areas. Additionally, the proposed framework shows improvement in mapping smaller and fractional ice-covered lakes over existing lake products. The mapped lake time series are consistent with the reconstructed products over the long term, while effectively avoiding spurious changes due to data source and processing uncertainties in the short term. This robust, automated framework is valuable for generating accurate, large-scale, and temporally dynamic lake maps to support global lake inventories and monitoring. The framework’s modular design also allows for future adaptation to other optical sensors such as Sentinel-2 and Moderate Resolution Imaging Spectroradiometer (MODIS) imagery, facilitating multi-source data fusion and enhanced surface water mapping capabilities.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"221 ","pages":"Pages 280-298"},"PeriodicalIF":10.6,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143418597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhen Cao , Xiaoxin Mi , Bo Qiu , Zhipeng Cao , Chen Long , Xinrui Yan , Chao Zheng , Zhen Dong , Bisheng Yang
{"title":"Cross-modal semantic transfer for point cloud semantic segmentation","authors":"Zhen Cao , Xiaoxin Mi , Bo Qiu , Zhipeng Cao , Chen Long , Xinrui Yan , Chao Zheng , Zhen Dong , Bisheng Yang","doi":"10.1016/j.isprsjprs.2025.01.024","DOIUrl":"10.1016/j.isprsjprs.2025.01.024","url":null,"abstract":"<div><div>3D street scene semantic segmentation is essential for urban understanding. However, supervised point cloud semantic segmentation networks heavily rely on expensive manual annotations and demonstrate limited generalization capabilities across datasets, which poses limitations in a range of downstream tasks. In contrast, image segmentation networks exhibit stronger generalization. Fortunately, mobile laser scanning systems can collect images and point clouds simultaneously, offering a potential solution for 2D-3D semantic transfer. In this paper, we introduce a cross-modal label transfer framework for point cloud semantic segmentation, without the supervision of 3D semantic annotation. Specifically, the proposed method takes point clouds and the associated posed images of a scene as inputs, and accomplishes the pointwise semantic segmentation for point clouds. We first get the image semantic pseudo-labels through a pre-trained image semantic segmentation model. Building on this, we construct implicit neural radiance fields (NeRF) to achieve multi-view consistent label mapping by jointly constructing color and semantic fields. Then, we design a superpoint semantic module to capture the local geometric features on point clouds, which contributes a lot to correcting semantic errors in the implicit field. Moreover, we introduce a dynamic object filter and a pose adjustment module to address the spatio-temporal misalignment between point clouds and images, further enhancing the consistency of the transferred semantic labels. The proposed approach has shown promising outcomes on two street scene datasets, namely KITTI-360 and WHU-Urban3D, highlighting the effectiveness and reliability of our method. Compared to the SoTA point cloud semantic segmentation method, namely SPT, the proposed method improves mIoU by approximately 15% on the WHU-Urban3D dataset. Our code and data are available at <span><span>https://github.com/a4152684/StreetSeg</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"221 ","pages":"Pages 265-279"},"PeriodicalIF":10.6,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143418598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haiming Zhang , Guorui Ma , Di Wang , Yongxian Zhang
{"title":"M3ICNet: A cross-modal resolution preserving building damage detection method with optical and SAR remote sensing imagery and two heterogeneous image disaster datasets","authors":"Haiming Zhang , Guorui Ma , Di Wang , Yongxian Zhang","doi":"10.1016/j.isprsjprs.2025.02.004","DOIUrl":"10.1016/j.isprsjprs.2025.02.004","url":null,"abstract":"<div><div>Building damage detection based on optical and SAR remote sensing imagery can mitigate the adverse effects of weather, climate, and nighttime imaging. However, under emergency conditions, inherent limitations such as satellite availability, sensor swath width, and data sensitivity make it challenging to unify the resolution of optical and SAR imagery covering the same area. Additionally, optical imagery with varying resolutions is generally more abundant than SAR imagery. Most existing research employs resampling to resize bi-temporal images before subsequent analysis. However, this practice often disrupts the original data structure and can distort the spectral reflectance characteristics or scattering intensity of damaged building targets in the images. Furthermore, the one-to-one use of optical-SAR imagery fails to leverage the richness of optical imagery resources for detection tasks. Currently, there is a scarcity of optical-SAR image datasets specifically tailored for building damage detection purposes. To capitalize on the quantitative and resolution advantages of optical images and effectively extract SAR image features while preserving the original data structure, we engineered M3ICNet—a multimodal, multiresolution, multilevel information interaction and convergence network. M3ICNet accepts inputs in cross-modal and cross-resolution formats, accommodating three types of optical-SAR-optical images with resolutions doubling incrementally. This design effectively incorporates optical imagery at two scales while maintaining the structural integrity of SAR imagery. The network operates horizontally and vertically, achieving multiscale resolution preservation and feature fusion alongside deep feature mining. Its parallelized feature interaction module refines the coherent representation of optical and SAR data features comprehensively. It accomplishes this by learning the dependencies across different scales through feature contraction and diffusion. Relying on the network’s innovative structure and core components, M3ICNet extracts consistent damage information between optical-SAR heterogeneous imagery and detects damaged buildings effectively. We gathered optical-SAR-optical remote sensing imagery from natural disasters (such as the Turkey earthquake) and man-made disasters (such as the Russian-Ukrainian conflict) to create two multimodal building damage detection datasets (WBD and EBD). Extensive comparative experiments were conducted using these two datasets, along with six publicly available optical-SAR datasets, employing ten supervised and unsupervised methods. The results indicate that M3ICNet achieves the highest average detection accuracy (F1-score) of nearly 80% on the damaged building dataset, outperforming other comparative methods on public datasets. Furthermore, it strikes a balance between accuracy and efficiency.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"221 ","pages":"Pages 224-250"},"PeriodicalIF":10.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ricky Nathvani , Alicia Cavanaugh , Esra Suel , Honor Bixby , Sierra N. Clark , Antje Barbara Metzler , James Nimo , Josephine Bedford Moses , Solomon Baah , Raphael E. Arku , Brian E. Robinson , Jill Baumgartner , James E Bennett , Abeer M. Arif , Ying Long , Samuel Agyei-Mensah , Majid Ezzati
{"title":"Measurement of urban vitality with time-lapsed street-view images and object-detection for scalable assessment of pedestrian-sidewalk dynamics","authors":"Ricky Nathvani , Alicia Cavanaugh , Esra Suel , Honor Bixby , Sierra N. Clark , Antje Barbara Metzler , James Nimo , Josephine Bedford Moses , Solomon Baah , Raphael E. Arku , Brian E. Robinson , Jill Baumgartner , James E Bennett , Abeer M. Arif , Ying Long , Samuel Agyei-Mensah , Majid Ezzati","doi":"10.1016/j.isprsjprs.2025.01.038","DOIUrl":"10.1016/j.isprsjprs.2025.01.038","url":null,"abstract":"<div><div>Principles of dense, mixed-use environments and pedestrianisation are influential in urban planning practice worldwide. A key outcome espoused by these principles is generating “urban vitality”, the continuous use of street sidewalk infrastructure throughout the day, to promote safety, economic viability and attractiveness of city neighbourhoods. Vitality is hypothesised to arise from a nearby mixture of primary uses, short blocks, density of buildings and population and a diversity in the age and condition of surrounding buildings. To investigate this claim, we use a novel dataset of 2.1 million time-lapsed day and night images at 145 representative locations throughout the city of Accra, Ghana. We developed a measure of urban vitality for each location based on the coefficient of variation in pedestrian volume over time in our images, obtained from counts of people identified using object detection. We also construct measures of “generators of diversity”: mixed-use intensity, building, block and population density, as well as diversity in the age of buildings, using data that are available across multiple cities and perform bivariate and multivariate regressions of our urban vitality measure against variables representing generators of diversity to test the latter’s association with vitality. We find that two or more unique kinds of amenities accessible within a five-minute walk from a given location, as well as the density of buildings (of varying ages and conditions) and short blocks, are associated with more even footfall throughout the day. Our analysis also indicates some potential negative trade-offs from dense and mixed-use neighbourhoods, such as being associated with more continuous road traffic throughout the day. Our methodological approach is scalable and adaptable to different modes of image data capture and can be widely adopted in other cities worldwide.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"221 ","pages":"Pages 251-264"},"PeriodicalIF":10.6,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruixiang Zhang , Chang Xu , Fang Xu , Wen Yang , Guangjun He , Huai Yu , Gui-Song Xia
{"title":"S3OD: Size-unbiased semi-supervised object detection in aerial images","authors":"Ruixiang Zhang , Chang Xu , Fang Xu , Wen Yang , Guangjun He , Huai Yu , Gui-Song Xia","doi":"10.1016/j.isprsjprs.2025.01.037","DOIUrl":"10.1016/j.isprsjprs.2025.01.037","url":null,"abstract":"<div><div>Aerial images present significant challenges to label-driven supervised learning, in particular, the annotation of substantial small-sized objects is a highly laborious process. To maximize the utility of scarce labeled data alongside the abundance of unlabeled data, we present a semi-supervised learning pipeline tailored for label-efficient object detection in aerial images. In our investigation, we identify three size-related biases inherent in semi-supervised object detection (SSOD): pseudo-label imbalance, label assignment imbalance, and negative learning imbalance. These biases significantly impair the detection performance of small objects. To address these issues, we propose a novel Size-unbiased Semi-Supervised Object Detection (S<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>OD) pipeline for aerial images. The S<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>OD pipeline comprises three key components: Size-aware Adaptive Thresholding (SAT), Size-rebalanced Label Assignment (SLA), and Teacher-guided Negative Learning (TNL), all aimed at fostering size-unbiased learning. Specifically, SAT adaptively selects appropriate thresholds to filter pseudo-labels for objects at different scales. SLA balances positive samples of objects at different sizes through resampling and reweighting. TNL alleviates the imbalance in negative samples by leveraging insights from the teacher model, enhancing the model’s ability to discern between object and background regions. Extensive experiments on DOTA-v1.5 and SODA-A demonstrate the superiority of S<span><math><msup><mrow></mrow><mrow><mn>3</mn></mrow></msup></math></span>OD over state-of-the-art competitors. Notably, with merely 5% SODA-A training labels, our method outperforms the fully supervised baseline by 2.17 points. Codes are available at <span><span>https://github.com/ZhangRuixiang-WHU/S3OD/tree/master</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"221 ","pages":"Pages 179-192"},"PeriodicalIF":10.6,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Streamlined multilayer perceptron for contaminated time series reconstruction: A case study in coastal zones of southern China","authors":"Siyu Qian , Zhaohui Xue , Mingming Jia , Hongsheng Zhang","doi":"10.1016/j.isprsjprs.2025.01.035","DOIUrl":"10.1016/j.isprsjprs.2025.01.035","url":null,"abstract":"<div><div>Time series reconstruction is pivotal for enabling continuous, long-term monitoring of environmental changes, particularly in rapidly evolving coastal ecosystems. Despite the array of developed reconstruction methods, they often fail to be effectively applied in coastal zones. In coastal zones, the dynamic environment and frequent cloud cover undermine the effectiveness of existing methods, making it challenging to accurately capture time series variations. Additionally, the need for long-term, large-scale monitoring demands methods that are both efficient and adaptable. To address these challenges, a streamlined multilayer perceptron (SMLP) method is proposed to reconstruct contaminated and long-term time series in coastal zones, consisting of three steps. Firstly, to mitigate the impact of anomalies, we constructed a frequency principle theory (FPT)-based filtering module. Subsequently, to capture variations within the time series, we proposed a frequency domain representation (FDR)-based decomposition module. Finally, considering gaps in time series, we applied an implicit neural representation (INR)-based reconstruction module. SMLP was evaluated using dense Landsat time series data from 1999 to 2019 in southern China, where the data face challenges from noise, gaps, and variations. Qualitative results show that the <span><math><mover><mrow><msub><mrow><mtext>RMSE</mtext></mrow><mrow><mi>c</mi></mrow></msub></mrow><mo>¯</mo></mover></math></span> of SMLP is 0.028, lower than other methods ranging from 0.02 to 0.05. Furthermore, quantitative analysis demonstrates that SMLP is more effective than existing approaches in mitigating the impact of anomalies and accurately capturing variations in time series. Additionally, the rapid operational speed and high transferability of SMLP makes it well-suited for long-term and large-scale applications, providing valuable support for coastal zone research.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"221 ","pages":"Pages 193-209"},"PeriodicalIF":10.6,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"4D RadarPR: Context-Aware 4D Radar Place Recognition in harsh scenarios","authors":"Yiwen Chen, Yuan Zhuang, Binliang Wang, Jianzhu Huai","doi":"10.1016/j.isprsjprs.2025.01.033","DOIUrl":"10.1016/j.isprsjprs.2025.01.033","url":null,"abstract":"<div><div>Place recognition is a fundamental technology for uncrewed systems such as robots and autonomous vehicles, enabling tasks like global localization and simultaneous localization and mapping (SLAM). Existing Place recognition technologies based on vision or LiDAR have made significant progress, but these sensors may degrade or fail in adverse conditions. 4D millimeter-wave radar offers strong resistance to particles like smoke, fog, rain, and snow, making it a promising option for robust scene perception and localization. Therefore, we explore the characteristics of 4D radar point clouds and propose a novel Context-Aware 4D Radar Place Recognition (4D RadarPR) method for adverse scenarios. Specifically, we first adopt a point-based feature extraction (PFE) module to capture raw point cloud information. On top of PFE, we propose a multi-scale context information fusion (MCIF) module to achieve local feature extraction at different scales and adaptive fusion. To capture global spatial relationships and integrate contextual information, the MCIF module introduces a fusion block based on multi-head cross-attention to combine point-wise features with local spatial features. Additionally, we explore the role of Radar Cross Section (RCS) information in enhancing the discriminability of descriptors and propose a local RCS relation-guided attention network to enhance local features before generating the global descriptor. Extensive experiments are conducted on in-house datasets and public datasets, covering various scenarios and including both long-range and short-range radar data. We compared the proposed method with several state-of-the-art approaches, including BevPlace++, LSP-Net, and Transloc4D, and achieved the best overall performance. Notably, on long-range radar data, our method achieved an average Recall@1 of 89.9%, outperforming the second-best method by 1.9%. Furthermore, our method demonstrated acceptable generalization ability across diverse scenarios, showcasing its robustness.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"221 ","pages":"Pages 210-223"},"PeriodicalIF":10.6,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Li , Shaowei Shi , Liupeng Lin , Qiangqiang Yuan , Huanfeng Shen , Liangpei Zhang
{"title":"A multi-task learning framework for dual-polarization SAR imagery despeckling in temporal change detection scenarios","authors":"Jie Li , Shaowei Shi , Liupeng Lin , Qiangqiang Yuan , Huanfeng Shen , Liangpei Zhang","doi":"10.1016/j.isprsjprs.2025.01.030","DOIUrl":"10.1016/j.isprsjprs.2025.01.030","url":null,"abstract":"<div><div>The despeckling task for synthetic aperture radar (SAR) has long faced the challenge of obtaining clean images. Although unsupervised deep learning despeckling methods alleviate this issue, they often struggle to balance despeckling effectiveness and the preservation of spatial details. Furthermore, some unsupervised despeckling approaches overlook the effect of land cover changes when dual-temporal SAR images are used as training data. To address this issue, we propose a multitask learning framework for dual-polarization SAR imagery despeckling and change detection (MTDN). This framework integrates polarization decomposition mechanisms with dual-polarization SAR images, and utilizes a change detection network to guide and constrain the despeckling network for optimized performance. Specifically, the despeckling branch of this framework incorporates polarization and spatiotemporal information from dual-temporal dual-polarization SAR images to construct a despeckling network. It employs various attention mechanisms to recalibrate features across local/global, channel, and spatial dimensions, and before and after despeckling. The change detection branch, which combines Transformer and convolutional neural networks, helps the despeckling branch effectively filter out spatiotemporal information with substantial changes. The multitask joint loss function is weighted by the generated change detection mask to achieve collaborative optimization. Despeckling and change detection experiments are conducted using a dual-polarization SAR dataset to assess the effectiveness of the proposed framework. The despeckling experiments indicate that MTDN efficiently eliminates speckle noise while preserving polarization information and spatial details, and surpasses current leading SAR despeckling methods. The equivalent number of looks (ENL) for MTDN in the agricultural change area increased to 155.0630, and the edge detail preservation (EPD) metric improved to 0.9963, which is better than the comparison methods. Furthermore, the change detection experiments confirm that MTDN yields precise predictions, highlighting its exceptional capability in practical applications. The code, dataset, and pre-trained MTDN will be available at <span><span>https://github.com/WHU-SGG-RS-Pro-Group/PolSAR-DESPECKLING-MTDN</span><svg><path></path></svg></span> for verification.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"221 ","pages":"Pages 155-178"},"PeriodicalIF":10.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143378174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jietao Lei , Jingbin Liu , Wei Zhang , Mengxiang Li , Juha Hyyppä
{"title":"Descriptor-based optical flow quality assessment and error model construction for visual localization","authors":"Jietao Lei , Jingbin Liu , Wei Zhang , Mengxiang Li , Juha Hyyppä","doi":"10.1016/j.isprsjprs.2025.01.019","DOIUrl":"10.1016/j.isprsjprs.2025.01.019","url":null,"abstract":"<div><div>Precise matching of visual features between frames is crucial for the robustness and accuracy of visual odometry and SLAM (Simultaneous Localization and Mapping) systems. However, factors such as complex illumination and texture variations may cause significant errors in feature correspondences that will degrade the accuracy of visual localization. In this paper, we utilize the feature descriptor to validate and assess the correspondence quality of the optical flow algorithm, and establish the information matrix of visual measurements, which is used for improving the accuracy of visual localization in the nonlinear optimization framework. This proposed approach of optical flow quality assessment leverages the complementary advantages of the optical flow algorithm and descriptor matching, and it is applicable to other visual odometry or SLAM systems that use the optical flow algorithm for feature correspondence. We first demonstrate through simulation experiments the statistical correlation between optical flow error and descriptor Hamming distance. Subsequently, based on the statistical correlation, the optical flow tracking error is quantitatively estimated using the descriptor Hamming distance. As a result, features with large tracking errors are rejected as outliers, and other features are remained with an adequate error model, i.e. information matrix in the nonlinear optimization, which corresponds with the visual tracking error. Furthermore, rather than direct tracking error between the initial observation frame and the current frame, we proposed the cumulative tracking error for successive frames (CTE-SF) to improve the efficiency of descriptor extraction in successive visual tracking, as it requires no the construction of multi-scale image pyramids. We evaluated the proposed solution using the open datasets and our developed in-house embedded positioning device. The results indicate that the proposed solution can improve the accuracy of visual odometry systems utilizing the optical flow algorithm for feature correspondence (e.g., VINS-Mono) by approximately 10%–50%, while requiring only an 11% increase in computational resource consumption. We have made our implementation open-source, available at: <span><span>https://github.com/Jett64/VINS-with-Error-Model</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"221 ","pages":"Pages 143-154"},"PeriodicalIF":10.6,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143377412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiamin Du , Xiubin Yang , Zongqiang Fu , Suining Gao , Tianyu Zhang , Jinyan Zou , Xi He , Shaoen Wang
{"title":"Image motion degradation compensation for high dynamic imaging of space-based vertical orbit scanning","authors":"Jiamin Du , Xiubin Yang , Zongqiang Fu , Suining Gao , Tianyu Zhang , Jinyan Zou , Xi He , Shaoen Wang","doi":"10.1016/j.isprsjprs.2025.01.029","DOIUrl":"10.1016/j.isprsjprs.2025.01.029","url":null,"abstract":"<div><div>Rotating Payload Satellite (RPS) utilizes payload rotation to drive the optical axis for vertical orbit scanning, which enables high-resolution and wide-coverage imaging of ground curved targets. However, the presence of irregular image motion degradation (IMD) in the dynamic imaging drastically degrades the imaging quality. High stability and high precision IMD compensation have become key point for high-resolution imaging of RPS. In this paper, an IMD compensation model is proposed based on velocity vector prediction and multiple disturbance identification. Firstly, time-varying multi-dimensional velocity vectors are analyzed based on the object-to-image mapping relationship. This method is used to predict the rotation angle of the sensor, which can ensure the sensor’s exposure direction always follows the direction of image motion. Then, to enhance accuracy and stability of compensation, the actual angular velocity of sensor rotation is extracted from various disturbance sources through coordinate transformation and provided as feedback. The experiment indicates that the precision and stability of sensor rotation can reach 3.925 × 10<sup>-3</sup> and 8.574 × 10<sup>-4</sup> deg/s. The compensation error is smaller than the threshold of 1/3 pixel. The simulated images of RPS indicate that the deblurring and cumulative deformation correction effects are significant. The image quality is improved by 52.68 % after compensation. It demonstrates that our approach is highly effective and crucial for the practical application of RPS.</div></div>","PeriodicalId":50269,"journal":{"name":"ISPRS Journal of Photogrammetry and Remote Sensing","volume":"221 ","pages":"Pages 124-142"},"PeriodicalIF":10.6,"publicationDate":"2025-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143372923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}