Man Chen , Wenquan Dong , Hao Yu , Iain H. Woodhouse , Casey M. Ryan , Haoyu Liu , Selena Georgiou , Edward T.A. Mitchard
{"title":"Multimodal deep learning enables forest height mapping from patchy spaceborne LiDAR using SAR and passive optical satellite data","authors":"Man Chen , Wenquan Dong , Hao Yu , Iain H. Woodhouse , Casey M. Ryan , Haoyu Liu , Selena Georgiou , Edward T.A. Mitchard","doi":"10.1016/j.jag.2025.104814","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate estimation of forest height plays a pivotal role in mapping carbon stocks from space. Spaceborne LiDARs give accurate spot estimates of forest canopy height, but sample only a tiny fraction of the landscape. The gaps must therefore be filled using other satellite remote sensing data. Although several studies have employed machine learning methods to produce wall-to-wall forest height maps, they have generally overlooked the distinct characteristics of various remote sensing data sources and have not fully exploited the potential benefits of multisource remote sensing integration. In this study, we propose a novel deep learning framework termed the multimodal attention remote sensing network (MARSNet) to extrapolate dominant heights derived from Global Ecosystem Dynamics Investigation (GEDI), using Sentinel-1 C-band Synthetic Aperture Radar (SAR) data, Advanced Land Observing Satellite-2 (ALOS-2) Phased Array type L-band Synthetic Aperture Radar-2 (PALSAR-2) data, and Sentinel-2 passive optical data. MARSNet comprises separate encoders for each remote sensing data modality to extract multi-scale features, and a shared decoder to fuse the features and estimate height. Using individual encoders for each remote sensing data source avoids interference across modalities and extracts distinct representations. To focus on the useful information from each dataset, we reduce the prevalent spatial and layer redundancies in each remote sensing data by incorporating the extended spatial and layer reconstruction convolution (ESLConv) modules in the encoders. MARSNet achieves good performance in estimating dominant height, with a R<sup>2</sup> of 0.62 and RMSE of 2.82 m on test data, outperforming the widely used random forest (RF) approach which attained an R<sup>2</sup> of 0.55 and RMSE of 3.05 m using the same layers. We demonstrate the efficacy of the MARSNet modules and the expansion of data sources for improving dominant height estimation through network ablation studies and data ablation studies. Finally, we apply the trained MARSNet model to generate wall-to-wall maps at 10 m resolution for Jilin province, China. Through independent validation using field measurements, MARSNet demonstrates an R<sup>2</sup> of 0.54 and RMSE of 3.76 m, compared to 0.39 and 4.37 m for the RF baseline model. Additionally, MARSNet effectively mitigates the common tendency of RF models to overestimate in low height areas and underestimate in high canopy areas (low sensitivity). Our research demonstrates the effectiveness of a multimodal deep learning approach fusing GEDI with SAR and passive optical imagery for enhancing the accuracy of high-resolution dominant height estimation. This method shows promise for enabling accurate large-scale forest height mapping in areas where high-quality ground data are available, potentially revolutionizing our understanding of global forest structure and carbon stocks.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"143 ","pages":"Article 104814"},"PeriodicalIF":8.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225004613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate estimation of forest height plays a pivotal role in mapping carbon stocks from space. Spaceborne LiDARs give accurate spot estimates of forest canopy height, but sample only a tiny fraction of the landscape. The gaps must therefore be filled using other satellite remote sensing data. Although several studies have employed machine learning methods to produce wall-to-wall forest height maps, they have generally overlooked the distinct characteristics of various remote sensing data sources and have not fully exploited the potential benefits of multisource remote sensing integration. In this study, we propose a novel deep learning framework termed the multimodal attention remote sensing network (MARSNet) to extrapolate dominant heights derived from Global Ecosystem Dynamics Investigation (GEDI), using Sentinel-1 C-band Synthetic Aperture Radar (SAR) data, Advanced Land Observing Satellite-2 (ALOS-2) Phased Array type L-band Synthetic Aperture Radar-2 (PALSAR-2) data, and Sentinel-2 passive optical data. MARSNet comprises separate encoders for each remote sensing data modality to extract multi-scale features, and a shared decoder to fuse the features and estimate height. Using individual encoders for each remote sensing data source avoids interference across modalities and extracts distinct representations. To focus on the useful information from each dataset, we reduce the prevalent spatial and layer redundancies in each remote sensing data by incorporating the extended spatial and layer reconstruction convolution (ESLConv) modules in the encoders. MARSNet achieves good performance in estimating dominant height, with a R2 of 0.62 and RMSE of 2.82 m on test data, outperforming the widely used random forest (RF) approach which attained an R2 of 0.55 and RMSE of 3.05 m using the same layers. We demonstrate the efficacy of the MARSNet modules and the expansion of data sources for improving dominant height estimation through network ablation studies and data ablation studies. Finally, we apply the trained MARSNet model to generate wall-to-wall maps at 10 m resolution for Jilin province, China. Through independent validation using field measurements, MARSNet demonstrates an R2 of 0.54 and RMSE of 3.76 m, compared to 0.39 and 4.37 m for the RF baseline model. Additionally, MARSNet effectively mitigates the common tendency of RF models to overestimate in low height areas and underestimate in high canopy areas (low sensitivity). Our research demonstrates the effectiveness of a multimodal deep learning approach fusing GEDI with SAR and passive optical imagery for enhancing the accuracy of high-resolution dominant height estimation. This method shows promise for enabling accurate large-scale forest height mapping in areas where high-quality ground data are available, potentially revolutionizing our understanding of global forest structure and carbon stocks.
期刊介绍:
The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.