Multimodal deep learning enables forest height mapping from patchy spaceborne LiDAR using SAR and passive optical satellite data

IF 8.6 Q1 REMOTE SENSING

International journal of applied earth observation and geoinformation : ITC journal Pub Date : 2025-09-01 DOI:10.1016/j.jag.2025.104814

Man Chen , Wenquan Dong , Hao Yu , Iain H. Woodhouse , Casey M. Ryan , Haoyu Liu , Selena Georgiou , Edward T.A. Mitchard

{"title":"Multimodal deep learning enables forest height mapping from patchy spaceborne LiDAR using SAR and passive optical satellite data","authors":"Man Chen , Wenquan Dong , Hao Yu , Iain H. Woodhouse , Casey M. Ryan , Haoyu Liu , Selena Georgiou , Edward T.A. Mitchard","doi":"10.1016/j.jag.2025.104814","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate estimation of forest height plays a pivotal role in mapping carbon stocks from space. Spaceborne LiDARs give accurate spot estimates of forest canopy height, but sample only a tiny fraction of the landscape. The gaps must therefore be filled using other satellite remote sensing data. Although several studies have employed machine learning methods to produce wall-to-wall forest height maps, they have generally overlooked the distinct characteristics of various remote sensing data sources and have not fully exploited the potential benefits of multisource remote sensing integration. In this study, we propose a novel deep learning framework termed the multimodal attention remote sensing network (MARSNet) to extrapolate dominant heights derived from Global Ecosystem Dynamics Investigation (GEDI), using Sentinel-1 C-band Synthetic Aperture Radar (SAR) data, Advanced Land Observing Satellite-2 (ALOS-2) Phased Array type L-band Synthetic Aperture Radar-2 (PALSAR-2) data, and Sentinel-2 passive optical data. MARSNet comprises separate encoders for each remote sensing data modality to extract multi-scale features, and a shared decoder to fuse the features and estimate height. Using individual encoders for each remote sensing data source avoids interference across modalities and extracts distinct representations. To focus on the useful information from each dataset, we reduce the prevalent spatial and layer redundancies in each remote sensing data by incorporating the extended spatial and layer reconstruction convolution (ESLConv) modules in the encoders. MARSNet achieves good performance in estimating dominant height, with a R<sup>2</sup> of 0.62 and RMSE of 2.82 m on test data, outperforming the widely used random forest (RF) approach which attained an R<sup>2</sup> of 0.55 and RMSE of 3.05 m using the same layers. We demonstrate the efficacy of the MARSNet modules and the expansion of data sources for improving dominant height estimation through network ablation studies and data ablation studies. Finally, we apply the trained MARSNet model to generate wall-to-wall maps at 10 m resolution for Jilin province, China. Through independent validation using field measurements, MARSNet demonstrates an R<sup>2</sup> of 0.54 and RMSE of 3.76 m, compared to 0.39 and 4.37 m for the RF baseline model. Additionally, MARSNet effectively mitigates the common tendency of RF models to overestimate in low height areas and underestimate in high canopy areas (low sensitivity). Our research demonstrates the effectiveness of a multimodal deep learning approach fusing GEDI with SAR and passive optical imagery for enhancing the accuracy of high-resolution dominant height estimation. This method shows promise for enabling accurate large-scale forest height mapping in areas where high-quality ground data are available, potentially revolutionizing our understanding of global forest structure and carbon stocks.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"143 ","pages":"Article 104814"},"PeriodicalIF":8.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225004613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate estimation of forest height plays a pivotal role in mapping carbon stocks from space. Spaceborne LiDARs give accurate spot estimates of forest canopy height, but sample only a tiny fraction of the landscape. The gaps must therefore be filled using other satellite remote sensing data. Although several studies have employed machine learning methods to produce wall-to-wall forest height maps, they have generally overlooked the distinct characteristics of various remote sensing data sources and have not fully exploited the potential benefits of multisource remote sensing integration. In this study, we propose a novel deep learning framework termed the multimodal attention remote sensing network (MARSNet) to extrapolate dominant heights derived from Global Ecosystem Dynamics Investigation (GEDI), using Sentinel-1 C-band Synthetic Aperture Radar (SAR) data, Advanced Land Observing Satellite-2 (ALOS-2) Phased Array type L-band Synthetic Aperture Radar-2 (PALSAR-2) data, and Sentinel-2 passive optical data. MARSNet comprises separate encoders for each remote sensing data modality to extract multi-scale features, and a shared decoder to fuse the features and estimate height. Using individual encoders for each remote sensing data source avoids interference across modalities and extracts distinct representations. To focus on the useful information from each dataset, we reduce the prevalent spatial and layer redundancies in each remote sensing data by incorporating the extended spatial and layer reconstruction convolution (ESLConv) modules in the encoders. MARSNet achieves good performance in estimating dominant height, with a R² of 0.62 and RMSE of 2.82 m on test data, outperforming the widely used random forest (RF) approach which attained an R² of 0.55 and RMSE of 3.05 m using the same layers. We demonstrate the efficacy of the MARSNet modules and the expansion of data sources for improving dominant height estimation through network ablation studies and data ablation studies. Finally, we apply the trained MARSNet model to generate wall-to-wall maps at 10 m resolution for Jilin province, China. Through independent validation using field measurements, MARSNet demonstrates an R² of 0.54 and RMSE of 3.76 m, compared to 0.39 and 4.37 m for the RF baseline model. Additionally, MARSNet effectively mitigates the common tendency of RF models to overestimate in low height areas and underestimate in high canopy areas (low sensitivity). Our research demonstrates the effectiveness of a multimodal deep learning approach fusing GEDI with SAR and passive optical imagery for enhancing the accuracy of high-resolution dominant height estimation. This method shows promise for enabling accurate large-scale forest height mapping in areas where high-quality ground data are available, potentially revolutionizing our understanding of global forest structure and carbon stocks.

查看原文本刊更多论文

多模态深度学习可以利用SAR和无源光学卫星数据从星载激光雷达上绘制森林高度图

森林高度的准确估计在从空间绘制碳储量图中起着关键作用。星载激光雷达提供了准确的森林冠层高度的现场估计，但只采样了景观的一小部分。因此，必须利用其他卫星遥感数据来填补这些空白。虽然有几项研究采用机器学习方法制作了墙到墙的森林高度图，但它们通常忽略了各种遥感数据源的独特特征，并且没有充分利用多源遥感集成的潜在好处。在这项研究中，我们提出了一个新的深度学习框架，称为多模态关注遥感网络（MARSNet），利用Sentinel-1 c波段合成孔径雷达（SAR）数据、先进陆地观测卫星2号（ALOS-2）相控阵型l波段合成孔径雷达2号（PALSAR-2）数据和Sentinel-2无源光学数据推断全球生态系统动力学调查（GEDI）得出的优势高度。MARSNet包括针对每种遥感数据模式的独立编码器，用于提取多尺度特征，以及共享解码器，用于融合特征和估计高度。为每个遥感数据源使用单独的编码器，避免了模态之间的干扰，并提取了不同的表示。为了关注每个数据集的有用信息，我们通过在编码器中加入扩展空间和层重建卷积（ESLConv）模块来减少每个遥感数据中普遍存在的空间和层冗余。MARSNet在估计优势高度方面表现良好，测试数据的R2为0.62，RMSE为2.82 m，优于广泛使用的随机森林（RF）方法，在相同的层数下，随机森林方法的R2为0.55，RMSE为3.05 m。我们通过网络消融研究和数据消融研究证明了MARSNet模块的有效性和数据源的扩展，以改善优势高度估计。最后，我们应用训练后的MARSNet模型生成了中国吉林省10米分辨率的墙到墙地图。通过现场测量的独立验证，MARSNet的R2为0.54，RMSE为3.76 m，而RF基线模型的R2为0.39，RMSE为4.37 m。此外，MARSNet有效地缓解了RF模型在低海拔地区高估和在高冠层地区低估的共同趋势（低灵敏度）。我们的研究证明了融合GEDI与SAR和被动光学图像的多模态深度学习方法在提高高分辨率优势高度估计精度方面的有效性。这种方法有望在高质量地面数据可用的地区实现精确的大规模森林高度测绘，可能彻底改变我们对全球森林结构和碳储量的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International journal of applied earth observation and geoinformation : ITC journal Global and Planetary Change, Management, Monitoring, Policy and Law, Earth-Surface Processes, Computers in Earth Sciences

CiteScore

12.00

自引率

0.00%

发文量

审稿时长

77 days

期刊介绍： The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.