Assessing the potential of synthetic and ex situ airborne laser scanning and ground plot data to train forest biomass models

IF 3.2 2区农林科学 Q1 FORESTRY

Forestry Pub Date : 2023-12-05 DOI:10.1093/forestry/cpad061

Jannika Schäfer, Lukas Winiwarter, Hannah Weiser, Jan Novotný, Bernhard Höfle, Sebastian Schmidtlein, Hans Henniger, Grzegorz Krok, Krzysztof Stereńczak, Fabian Ewald Fassnacht

{"title":"Assessing the potential of synthetic and ex situ airborne laser scanning and ground plot data to train forest biomass models","authors":"Jannika Schäfer, Lukas Winiwarter, Hannah Weiser, Jan Novotný, Bernhard Höfle, Sebastian Schmidtlein, Hans Henniger, Grzegorz Krok, Krzysztof Stereńczak, Fabian Ewald Fassnacht","doi":"10.1093/forestry/cpad061","DOIUrl":null,"url":null,"abstract":"Airborne laser scanning data are increasingly used to predict forest biomass over large areas. Biomass information cannot be derived directly from airborne laser scanning data; therefore, field measurements of forest plots are required to build regression models. We tested whether simulated laser scanning data of virtual forest plots could be used to train biomass models and thereby reduce the amount of field measurements required. We compared the performance of models that were trained with (i) simulated data only, (ii) a combination of simulated and real data, (iii) real data collected from different study sites, and (iv) real data collected from the same study site the model was applied to. We additionally investigated whether using a subset of the simulated data instead of using all simulated data improved model performance. The best matching subset of the simulated data was sampled by selecting the simulated forest plot with the highest correlation of the return height distribution profile for each real forest plot. For comparison, a randomly selected subset was evaluated. Models were tested on four forest sites located in Poland, the Czech Republic, and Canada. Model performance was assessed by root mean squared error (RMSE), squared Pearson correlation coefficient (r$^{2}$), and mean error (ME) of observed and predicted biomass. We found that models trained solely with simulated data did not achieve the accuracy of models trained with real data (RMSE increase of 52–122 %, r$^{2}$ decrease of 4–18 %). However, model performance improved when only a subset of the simulated data was used (RMSE increase of 21–118 %, r$^{2}$ decrease of 5–14 % compared to the real data model), albeit differences in model performance when using the best matching subset compared to using a randomly selected subset were small. Using simulated data for model training always resulted in a strong underprediction of biomass. Extending sparse real training datasets with simulated data decreased RMSE and increased r$^{2}$, as long as no more than 12–346 real training samples were available, depending on the study site. For three of the four study sites, models trained with real data collected from other sites outperformed models trained with simulated data and RMSE and r$^{2}$ were similar to models trained with data from the respective sites. Our results indicate that simulated data cannot yet replace real data but they can be helpful in some sites to extend training datasets when only a limited amount of real data is available.","PeriodicalId":12342,"journal":{"name":"Forestry","volume":"66 5-6","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forestry","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1093/forestry/cpad061","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FORESTRY","Score":null,"Total":0}

引用次数: 0

Abstract

Airborne laser scanning data are increasingly used to predict forest biomass over large areas. Biomass information cannot be derived directly from airborne laser scanning data; therefore, field measurements of forest plots are required to build regression models. We tested whether simulated laser scanning data of virtual forest plots could be used to train biomass models and thereby reduce the amount of field measurements required. We compared the performance of models that were trained with (i) simulated data only, (ii) a combination of simulated and real data, (iii) real data collected from different study sites, and (iv) real data collected from the same study site the model was applied to. We additionally investigated whether using a subset of the simulated data instead of using all simulated data improved model performance. The best matching subset of the simulated data was sampled by selecting the simulated forest plot with the highest correlation of the return height distribution profile for each real forest plot. For comparison, a randomly selected subset was evaluated. Models were tested on four forest sites located in Poland, the Czech Republic, and Canada. Model performance was assessed by root mean squared error (RMSE), squared Pearson correlation coefficient (r$^{2}$), and mean error (ME) of observed and predicted biomass. We found that models trained solely with simulated data did not achieve the accuracy of models trained with real data (RMSE increase of 52–122 %, r$^{2}$ decrease of 4–18 %). However, model performance improved when only a subset of the simulated data was used (RMSE increase of 21–118 %, r$^{2}$ decrease of 5–14 % compared to the real data model), albeit differences in model performance when using the best matching subset compared to using a randomly selected subset were small. Using simulated data for model training always resulted in a strong underprediction of biomass. Extending sparse real training datasets with simulated data decreased RMSE and increased r$^{2}$, as long as no more than 12–346 real training samples were available, depending on the study site. For three of the four study sites, models trained with real data collected from other sites outperformed models trained with simulated data and RMSE and r$^{2}$ were similar to models trained with data from the respective sites. Our results indicate that simulated data cannot yet replace real data but they can be helpful in some sites to extend training datasets when only a limited amount of real data is available.

查看原文本刊更多论文

评估综合和非原位机载激光扫描和地面数据训练森林生物量模型的潜力

机载激光扫描数据越来越多地用于预测大面积的森林生物量。生物质信息不能直接从机载激光扫描数据中获得;因此，建立回归模型需要对森林样地进行实地测量。我们测试了虚拟森林样地的模拟激光扫描数据是否可以用于训练生物量模型，从而减少所需的实地测量量。我们比较了使用(i)模拟数据训练的模型的性能，(ii)模拟数据和真实数据的组合，(iii)从不同研究地点收集的真实数据，以及(iv)从模型应用的同一研究地点收集的真实数据。我们还研究了使用模拟数据的子集而不是使用所有模拟数据是否可以提高模型性能。选取与每个真实森林样地的回归高度分布曲线相关性最高的模拟森林样地作为模拟数据的最佳匹配子集。为了比较，随机选择一个子集进行评估。这些模型在波兰、捷克共和国和加拿大的四个森林地点进行了测试。模型性能通过观测和预测生物量的均方根误差(RMSE)、平方Pearson相关系数(r$^{2}$)和平均误差(ME)来评估。我们发现，仅用模拟数据训练的模型并没有达到用真实数据训练的模型的精度(RMSE增加52 - 122%，r$^{2}$减少4 - 18%)。然而，当只使用模拟数据的一个子集时，模型性能得到了改善(与真实数据模型相比，RMSE增加了21 - 118%，r$^{2}$减少了5 - 14%)，尽管使用最佳匹配子集与使用随机选择的子集相比，模型性能的差异很小。使用模拟数据进行模型训练总是导致对生物量的严重低估。使用模拟数据扩展稀疏真实训练数据集降低了RMSE，增加了r$^{2}$，只要真实训练样本不超过12-346个，具体取决于研究地点。对于四个研究站点中的三个站点，使用从其他站点收集的真实数据训练的模型优于使用模拟数据训练的模型，RMSE和r$^{2}$与使用各自站点的数据训练的模型相似。我们的研究结果表明，模拟数据还不能取代真实数据，但在某些站点，当只有有限数量的真实数据可用时，模拟数据可以帮助扩展训练数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Forestry 农林科学-林学

CiteScore

6.70

自引率

7.10%

发文量

审稿时长

12-24 weeks

期刊介绍： The journal is inclusive of all subjects, geographical zones and study locations, including trees in urban environments, plantations and natural forests. We welcome papers that consider economic, environmental and social factors and, in particular, studies that take an integrated approach to sustainable management. In considering suitability for publication, attention is given to the originality of contributions and their likely impact on policy and practice, as well as their contribution to the development of knowledge. Special Issues - each year one edition of Forestry will be a Special Issue and will focus on one subject in detail; this will usually be by publication of the proceedings of an international meeting.