Combining production ecology principles with random forest to model potato yield in China

IF 5.6 1区 农林科学 Q1 AGRONOMY
Qiuhong Huang , Gerard B.M. Heuvelink , Ping He , Johan G.B. Leenaars , Antonius G.T. Schut
{"title":"Combining production ecology principles with random forest to model potato yield in China","authors":"Qiuhong Huang ,&nbsp;Gerard B.M. Heuvelink ,&nbsp;Ping He ,&nbsp;Johan G.B. Leenaars ,&nbsp;Antonius G.T. Schut","doi":"10.1016/j.fcr.2024.109619","DOIUrl":null,"url":null,"abstract":"<div><h3>Context</h3><div>The random forest model (RF) has been widely applied for crop yield prediction. However, extrapolation, measurement errors, and uncertainty arising from limited predictive power of covariates may affect the model performance.</div></div><div><h3>Objective</h3><div>This study aimed to interpret and assess the accuracy of RF for potato yield prediction in China and quantify the main sources of uncertainty using the C.T. de Wit’s three-quadrant diagram.</div></div><div><h3>Methods</h3><div>A dataset including 2182 plot-year combinations was derived from 63 potato field experiments covering nine Chinese provinces and three years. Model performance was evaluated by 10-fold cross-validation (CV), leave-block-out (LBOCV), leave-site-out (LSOCV), and leave-year-out cross-validation (LYOCV).</div></div><div><h3>Results</h3><div>The root mean square error (RMSE) was 3.5, 8.3, 9.9 and 10.3 t ha<sup>−1</sup>, while the model efficiency coefficient (MEC) was 0.92, 0.64, 0.52 and 0.43 for 10-fold CV, LBOCV, LSOCV and LYOCV, respectively. Cumulated sunshine duration and topography position index were the most important covariates, while fertiliser variables were identified as least important for yield modelling. The standard deviation of the yield replicate variability estimated by a linear model accounted for 32 % of the RMSE for LSOCV. Introducing measured uptake of nutrient omission treatments, uptake of all treatments, and yields of nutrient omission treatments as additional covariates decreased the LSOCV RMSE by 2.3 t ha<sup>−1</sup> on average.</div></div><div><h3>Conclusions</h3><div>The fitted models could explain up to 92 % of potato yield variability in China, although there was a considerable residual error when extrapolating to other areas or years. Yield replicate variability accounted for one-third of the residual error. Information about physiological efficiency was the main source of uncertainty, followed by available soil nutrients. Fertiliser recovery was least important because most of the experiments were conducted in fertile fields.</div></div><div><h3>Implications</h3><div>Combining a RF model with the three-quadrant diagram allows to better explain yield prediction uncertainty. The methodology used in this study can be applied to other crops, countries and data-driven models.</div></div>","PeriodicalId":12143,"journal":{"name":"Field Crops Research","volume":"319 ","pages":"Article 109619"},"PeriodicalIF":5.6000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Field Crops Research","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378429024003721","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0

Abstract

Context

The random forest model (RF) has been widely applied for crop yield prediction. However, extrapolation, measurement errors, and uncertainty arising from limited predictive power of covariates may affect the model performance.

Objective

This study aimed to interpret and assess the accuracy of RF for potato yield prediction in China and quantify the main sources of uncertainty using the C.T. de Wit’s three-quadrant diagram.

Methods

A dataset including 2182 plot-year combinations was derived from 63 potato field experiments covering nine Chinese provinces and three years. Model performance was evaluated by 10-fold cross-validation (CV), leave-block-out (LBOCV), leave-site-out (LSOCV), and leave-year-out cross-validation (LYOCV).

Results

The root mean square error (RMSE) was 3.5, 8.3, 9.9 and 10.3 t ha−1, while the model efficiency coefficient (MEC) was 0.92, 0.64, 0.52 and 0.43 for 10-fold CV, LBOCV, LSOCV and LYOCV, respectively. Cumulated sunshine duration and topography position index were the most important covariates, while fertiliser variables were identified as least important for yield modelling. The standard deviation of the yield replicate variability estimated by a linear model accounted for 32 % of the RMSE for LSOCV. Introducing measured uptake of nutrient omission treatments, uptake of all treatments, and yields of nutrient omission treatments as additional covariates decreased the LSOCV RMSE by 2.3 t ha−1 on average.

Conclusions

The fitted models could explain up to 92 % of potato yield variability in China, although there was a considerable residual error when extrapolating to other areas or years. Yield replicate variability accounted for one-third of the residual error. Information about physiological efficiency was the main source of uncertainty, followed by available soil nutrients. Fertiliser recovery was least important because most of the experiments were conducted in fertile fields.

Implications

Combining a RF model with the three-quadrant diagram allows to better explain yield prediction uncertainty. The methodology used in this study can be applied to other crops, countries and data-driven models.
结合生产生态学原理和随机森林建立中国马铃薯产量模型
背景随机森林模型(RF)已被广泛应用于作物产量预测。本研究旨在利用 C.T. de Wit 的三象限图解释和评估 RF 在中国马铃薯产量预测中的准确性,并量化不确定性的主要来源。方法从 63 个马铃薯田间试验中提取数据集,包括 2182 个小区-年份组合,覆盖中国 9 个省和 3 个年份。通过 10 倍交叉验证(CV)、排除法(LBOCV)、排除法(LSOCV)和排除法(LYOCV)对模型性能进行了评估。结果 10 倍 CV、LBOCV、LSOCV 和 LYOCV 的均方根误差(RMSE)分别为 3.5、8.3、9.9 和 10.3 t ha-1,模型效率系数(MEC)分别为 0.92、0.64、0.52 和 0.43。累积日照时间和地形位置指数是最重要的协变量,而肥料变量被认为对产量建模最不重要。线性模型估计的产量重复变异性标准偏差占 LSOCV RMSE 的 32%。结论:拟合模型可解释中国高达 92% 的马铃薯产量变异,尽管在推断其他地区或年份时存在相当大的残差误差。产量重复变异占残余误差的三分之一。生理效率信息是不确定性的主要来源,其次是可用土壤养分。将 RF 模型与三象限图相结合,可以更好地解释产量预测的不确定性。本研究中使用的方法可应用于其他作物、国家和数据驱动模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Field Crops Research
Field Crops Research 农林科学-农艺学
CiteScore
9.60
自引率
12.10%
发文量
307
审稿时长
46 days
期刊介绍: Field Crops Research is an international journal publishing scientific articles on: √ experimental and modelling research at field, farm and landscape levels on temperate and tropical crops and cropping systems, with a focus on crop ecology and physiology, agronomy, and plant genetics and breeding.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信