{"title":"Modelling point mass balance for the glaciers of the Central European Alps using machine learning techniques","authors":"Ritu Anilkumar, R. Bharti, D. Chutia, S. Aggarwal","doi":"10.5194/tc-17-2811-2023","DOIUrl":null,"url":null,"abstract":"Abstract. Glacier mass balance is typically estimated using a range of in situ measurements, remote sensing measurements, and physical and temperature index modelling techniques. With improved data collection and access to large datasets, data-driven techniques have recently gained prominence in modelling natural processes. The most common data-driven techniques used today are linear regression models and, to some extent, non-linear machine learning models such as artificial neural networks. However, the entire host of capabilities of machine learning modelling has not been applied to glacier mass balance modelling. This study used monthly meteorological data from ERA5-Land to drive four machine learning models: random forest (ensemble tree type), gradient-boosted regressor (ensemble tree type), support vector machine (kernel type), and artificial neural networks (neural type). We also use ordinary least squares linear regression as a baseline model against which to compare the performance of the machine learning models. Further, we assess the requirement of data for each of the models and the requirement for hyperparameter tuning. Finally, the importance of each meteorological variable in the mass balance estimation for each of the models is estimated using permutation importance. All machine learning models outperform the linear regression model. The neural network model depicted a low bias, suggesting the possibility of enhanced results in the event of biased input data. However, the ensemble tree-based models, random forest and gradient-boosted regressor, outperformed all other models in terms of the evaluation metrics and interpretability of the meteorological variables. The gradient-boosted regression model depicted the best coefficient of determination value of 0.713 and a root mean squared error of 1.071 m w.e. The feature importance values associated with all machine learning models suggested a high importance of meteorological variables associated with ablation. This is in line with predominantly negative mass balance observations. We conclude that machine learning techniques are promising in estimating glacier mass balance and can incorporate information from more significant meteorological variables as opposed to a simplified set of variables used in temperature index models.\n","PeriodicalId":56315,"journal":{"name":"Cryosphere","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cryosphere","FirstCategoryId":"89","ListUrlMain":"https://doi.org/10.5194/tc-17-2811-2023","RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GEOGRAPHY, PHYSICAL","Score":null,"Total":0}
引用次数: 1
Abstract
Abstract. Glacier mass balance is typically estimated using a range of in situ measurements, remote sensing measurements, and physical and temperature index modelling techniques. With improved data collection and access to large datasets, data-driven techniques have recently gained prominence in modelling natural processes. The most common data-driven techniques used today are linear regression models and, to some extent, non-linear machine learning models such as artificial neural networks. However, the entire host of capabilities of machine learning modelling has not been applied to glacier mass balance modelling. This study used monthly meteorological data from ERA5-Land to drive four machine learning models: random forest (ensemble tree type), gradient-boosted regressor (ensemble tree type), support vector machine (kernel type), and artificial neural networks (neural type). We also use ordinary least squares linear regression as a baseline model against which to compare the performance of the machine learning models. Further, we assess the requirement of data for each of the models and the requirement for hyperparameter tuning. Finally, the importance of each meteorological variable in the mass balance estimation for each of the models is estimated using permutation importance. All machine learning models outperform the linear regression model. The neural network model depicted a low bias, suggesting the possibility of enhanced results in the event of biased input data. However, the ensemble tree-based models, random forest and gradient-boosted regressor, outperformed all other models in terms of the evaluation metrics and interpretability of the meteorological variables. The gradient-boosted regression model depicted the best coefficient of determination value of 0.713 and a root mean squared error of 1.071 m w.e. The feature importance values associated with all machine learning models suggested a high importance of meteorological variables associated with ablation. This is in line with predominantly negative mass balance observations. We conclude that machine learning techniques are promising in estimating glacier mass balance and can incorporate information from more significant meteorological variables as opposed to a simplified set of variables used in temperature index models.
摘要冰川质量平衡通常使用一系列现场测量、遥感测量以及物理和温度指数建模技术来估计。随着数据收集和大型数据集访问的改进,数据驱动技术最近在自然过程建模方面变得突出。今天使用的最常见的数据驱动技术是线性回归模型,在某种程度上,还有非线性机器学习模型,如人工神经网络。然而,机器学习建模的全部能力尚未应用于冰川质量平衡建模。本研究使用ERA5 Land的月度气象数据驱动了四个机器学习模型:随机森林(集合树类型)、梯度增强回归器(集合树型)、支持向量机(核型)和人工神经网络(神经型)。我们还使用普通最小二乘线性回归作为基线模型来比较机器学习模型的性能。此外,我们评估了每个模型的数据需求和超参数调整的需求。最后,使用排列重要性来估计每个模型的质量平衡估计中每个气象变量的重要性。所有的机器学习模型都优于线性回归模型。神经网络模型描述了低偏差,表明在输入数据有偏差的情况下,结果可能会增强。然而,基于集合树的模型,随机森林和梯度增强回归器,在评估指标和气象变量的可解释性方面优于所有其他模型。梯度增强回归模型的最佳决定系数为0.713,均方根误差为1.071 m w.e.与所有机器学习模型相关的特征重要性值表明与消融相关的气象变量具有高度重要性。这与主要的负质量平衡观测结果一致。我们得出的结论是,机器学习技术在估计冰川质量平衡方面很有前景,并且可以结合来自更重要的气象变量的信息,而不是温度指数模型中使用的一组简化变量。
期刊介绍:
The Cryosphere (TC) is a not-for-profit international scientific journal dedicated to the publication and discussion of research articles, short communications, and review papers on all aspects of frozen water and ground on Earth and on other planetary bodies.
The main subject areas are the following:
ice sheets and glaciers;
planetary ice bodies;
permafrost and seasonally frozen ground;
seasonal snow cover;
sea ice;
river and lake ice;
remote sensing, numerical modelling, in situ and laboratory studies of the above and including studies of the interaction of the cryosphere with the rest of the climate system.