Machine learning based on functional principal component analysis to quantify the effects of the main drivers of wheat yields

IF 5.5 1区农林科学 Q1 AGRONOMY

European Journal of Agronomy Pub Date : 2024-06-28 DOI:10.1016/j.eja.2024.127254

Florent Bonneu , David Makowski , Julien Joly , Denis Allard

{"title":"Machine learning based on functional principal component analysis to quantify the effects of the main drivers of wheat yields","authors":"Florent Bonneu , David Makowski , Julien Joly , Denis Allard","doi":"10.1016/j.eja.2024.127254","DOIUrl":null,"url":null,"abstract":"<div><p>Assessing the response of crop yield to year-to-year climate variability at the field scale is often done using process-based models and regression techniques. Although powerful, these tools rely on strong assumptions and can lead to substantial prediction errors. In this study, we investigate the use of a flexible machine learning algorithm combining Functional Principal Component Analysis and Random Forest, to relate field scale wheat yield to local daily climate variables. Instead of computing seasonal, monthly or any other arbitrary time-frame climate averages, climate time series are decomposed by Functional Principal Component Analysis into a few data-driven basis functions, called Principal Curves, in order to summarize the dynamic of key climate variables by a limited number of interpretable components. Scores associated to these components are then used as inputs of a Random Forest algorithm for yield prediction and for analysing important factors responsible for yield variability. To evaluate our approach, we use a French national database including wheat yield data as well as climate and management practice data for 298 farm fields from 2011 to 2016 in four main producing regions. Depending on the regions, our approach can explain from 62 % to 81 % of the yield variability when both agronomic and climate variables are included, down to 56–81 % when ignoring agronomic variables and 51–74 % when ignoring climate variables. Based on a year-by-year cross-validation, RMSE ranges from 0.5 t ha<sup>−1</sup> to 2.1 t ha<sup>−1</sup> in non-extreme years (2012–2015). However, prediction error can reach 3.6 t ha<sup>−1</sup> in case of exceptional weather conditions, such as those experienced in 2016 in Northern France. We find that this new approach performs in most cases better than the same machine learning algorithm using the usual time averages of climate variables, without the need to choose an arbitrary time-frame. We then show how important patterns in weather time series can be identified and how their effects on yield can be interpreted using the proposed modelling framework.</p></div>","PeriodicalId":51045,"journal":{"name":"European Journal of Agronomy","volume":"159 ","pages":"Article 127254"},"PeriodicalIF":5.5000,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1161030124001758/pdfft?md5=5a60031477f201b04ffc6e1078797ed8&pid=1-s2.0-S1161030124001758-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Agronomy","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1161030124001758","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}

引用次数: 0

Abstract

Assessing the response of crop yield to year-to-year climate variability at the field scale is often done using process-based models and regression techniques. Although powerful, these tools rely on strong assumptions and can lead to substantial prediction errors. In this study, we investigate the use of a flexible machine learning algorithm combining Functional Principal Component Analysis and Random Forest, to relate field scale wheat yield to local daily climate variables. Instead of computing seasonal, monthly or any other arbitrary time-frame climate averages, climate time series are decomposed by Functional Principal Component Analysis into a few data-driven basis functions, called Principal Curves, in order to summarize the dynamic of key climate variables by a limited number of interpretable components. Scores associated to these components are then used as inputs of a Random Forest algorithm for yield prediction and for analysing important factors responsible for yield variability. To evaluate our approach, we use a French national database including wheat yield data as well as climate and management practice data for 298 farm fields from 2011 to 2016 in four main producing regions. Depending on the regions, our approach can explain from 62 % to 81 % of the yield variability when both agronomic and climate variables are included, down to 56–81 % when ignoring agronomic variables and 51–74 % when ignoring climate variables. Based on a year-by-year cross-validation, RMSE ranges from 0.5 t ha⁻¹ to 2.1 t ha⁻¹ in non-extreme years (2012–2015). However, prediction error can reach 3.6 t ha⁻¹ in case of exceptional weather conditions, such as those experienced in 2016 in Northern France. We find that this new approach performs in most cases better than the same machine learning algorithm using the usual time averages of climate variables, without the need to choose an arbitrary time-frame. We then show how important patterns in weather time series can be identified and how their effects on yield can be interpreted using the proposed modelling framework.

查看原文本刊更多论文

基于功能主成分分析的机器学习，量化小麦产量主要驱动因素的影响

在田间尺度上评估作物产量对逐年气候变异性的响应时，通常使用基于过程的模型和回归技术。这些工具虽然功能强大，但依赖于强有力的假设，可能会导致很大的预测误差。在本研究中，我们研究了如何使用灵活的机器学习算法，结合功能主成分分析和随机森林，将田间尺度的小麦产量与当地每日气候变量联系起来。与计算季节、月度或任何其他任意时间范围的气候平均值不同，功能主成分分析法将气候时间序列分解为一些数据驱动的基础函数（称为主曲线），以便用数量有限的可解释成分来概括关键气候变量的动态。然后，与这些成分相关的分数被用作随机森林算法的输入，用于预测产量和分析造成产量变化的重要因素。为了评估我们的方法，我们使用了一个法国国家数据库，其中包括 2011 年至 2016 年四个主产区 298 块农田的小麦产量数据以及气候和管理实践数据。根据不同地区的情况，当同时包含农艺和气候变量时，我们的方法可以解释 62% 到 81% 的产量变异性；当忽略农艺变量时，可以解释 56-81% 的产量变异性；当忽略气候变量时，可以解释 51-74% 的产量变异性。根据逐年交叉验证，在非极端年份（2012-2015 年），均方根误差介于 0.5 吨/公顷-1 到 2.1 吨/公顷-1 之间。然而，在特殊天气条件下，如 2016 年法国北部的特殊天气条件，预测误差可达 3.6 吨/公顷。我们发现，在大多数情况下，这种新方法的性能要优于使用通常的气候变量时间平均值的相同机器学习算法，而无需选择任意的时间框架。然后，我们展示了如何识别天气时间序列中的重要模式，以及如何利用所提出的建模框架解释它们对产量的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

European Journal of Agronomy 农林科学-农艺学

CiteScore

8.30

自引率

7.70%

发文量

187

审稿时长

4.5 months

期刊介绍： The European Journal of Agronomy, the official journal of the European Society for Agronomy, publishes original research papers reporting experimental and theoretical contributions to field-based agronomy and crop science. The journal will consider research at the field level for agricultural, horticultural and tree crops, that uses comprehensive and explanatory approaches. The EJA covers the following topics: crop physiology crop production and management including irrigation, fertilization and soil management agroclimatology and modelling plant-soil relationships crop quality and post-harvest physiology farming and cropping systems agroecosystems and the environment crop-weed interactions and management organic farming horticultural crops papers from the European Society for Agronomy bi-annual meetings In determining the suitability of submitted articles for publication, particular scrutiny is placed on the degree of novelty and significance of the research and the extent to which it adds to existing knowledge in agronomy.