通过聚类平均采样策略改进全球土壤水分预测

IF 5.6 1区农林科学 Q1 SOIL SCIENCE

Geoderma Pub Date : 2024-08-13 DOI:10.1016/j.geoderma.2024.116999

{"title":"通过聚类平均采样策略改进全球土壤水分预测","authors":"","doi":"10.1016/j.geoderma.2024.116999","DOIUrl":null,"url":null,"abstract":"<div><p>Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R<sup>2</sup>) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R<sup>2</sup> improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.</p></div>","PeriodicalId":12511,"journal":{"name":"Geoderma","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0016706124002283/pdfft?md5=7b7a5fc5b0181bfd9cd70f884cf867ba&pid=1-s2.0-S0016706124002283-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Improving global soil moisture prediction through cluster-averaged sampling strategy\",\"authors\":\"\",\"doi\":\"10.1016/j.geoderma.2024.116999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R<sup>2</sup>) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R<sup>2</sup> improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.</p></div>\",\"PeriodicalId\":12511,\"journal\":{\"name\":\"Geoderma\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0016706124002283/pdfft?md5=7b7a5fc5b0181bfd9cd70f884cf867ba&pid=1-s2.0-S0016706124002283-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geoderma\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0016706124002283\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOIL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoderma","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016706124002283","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}

引用次数: 0

摘要

了解和预测全球土壤湿度（SM）对于水资源管理和农业生产至关重要。虽然深度学习方法（DL）在土壤湿度预测方面表现出了很强的性能，但具有不同特征的训练样本的不平衡性带来了巨大的挑战。我们提出，在梯度下降过程中改善批量训练样本的多样性和平衡性有助于解决这一问题。为了验证这一假设，我们利用无监督学习技术开发了集群平均采样（CAS）策略。这种方法是用来自不同集群的均匀采样数据来训练模型，确保每个集群内的样本多样性和数值一致性。这种方法可以防止模型过分强调特定样本的特征，从而实现更均衡的特征学习。使用 LandBench1.0 数据集和五种不同的种子进行 1 天提前期全局预测的实验表明，CAS 的表现优于未采用这种策略的几种基于长短期记忆（LSTM）的模型。中位判定系数（R）提高了 2.36 %，达到 4.31 %；Kling-Gupta 效率（KGE）提高了 1.95 %，达到 3.16 %。在高纬度地区，特定区域的 R 提高了 40% 以上。为了在现实条件下进一步验证 CAS，我们使用土壤水分主动和被动三级（SMAP-L3）卫星数据对其进行了 1 至 3 天提前期全球预测测试，证实了其功效。这项研究证实了 CAS 策略，并引入了一种新的训练方法来增强 DL 模型的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improving global soil moisture prediction through cluster-averaged sampling strategy

Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R²) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R² improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Geoderma 农林科学-土壤科学

CiteScore

11.80

自引率

6.60%

发文量

597

审稿时长

58 days

期刊介绍： Geoderma - the global journal of soil science - welcomes authors, readers and soil research from all parts of the world, encourages worldwide soil studies, and embraces all aspects of soil science and its associated pedagogy. The journal particularly welcomes interdisciplinary work focusing on dynamic soil processes and functions across space and time.