{"title":"Machine learning based on functional principal component analysis to quantify the effects of the main drivers of wheat yields","authors":"Florent Bonneu , David Makowski , Julien Joly , Denis Allard","doi":"10.1016/j.eja.2024.127254","DOIUrl":null,"url":null,"abstract":"<div><p>Assessing the response of crop yield to year-to-year climate variability at the field scale is often done using process-based models and regression techniques. Although powerful, these tools rely on strong assumptions and can lead to substantial prediction errors. In this study, we investigate the use of a flexible machine learning algorithm combining Functional Principal Component Analysis and Random Forest, to relate field scale wheat yield to local daily climate variables. Instead of computing seasonal, monthly or any other arbitrary time-frame climate averages, climate time series are decomposed by Functional Principal Component Analysis into a few data-driven basis functions, called Principal Curves, in order to summarize the dynamic of key climate variables by a limited number of interpretable components. Scores associated to these components are then used as inputs of a Random Forest algorithm for yield prediction and for analysing important factors responsible for yield variability. To evaluate our approach, we use a French national database including wheat yield data as well as climate and management practice data for 298 farm fields from 2011 to 2016 in four main producing regions. Depending on the regions, our approach can explain from 62 % to 81 % of the yield variability when both agronomic and climate variables are included, down to 56–81 % when ignoring agronomic variables and 51–74 % when ignoring climate variables. Based on a year-by-year cross-validation, RMSE ranges from 0.5 t ha<sup>−1</sup> to 2.1 t ha<sup>−1</sup> in non-extreme years (2012–2015). However, prediction error can reach 3.6 t ha<sup>−1</sup> in case of exceptional weather conditions, such as those experienced in 2016 in Northern France. We find that this new approach performs in most cases better than the same machine learning algorithm using the usual time averages of climate variables, without the need to choose an arbitrary time-frame. We then show how important patterns in weather time series can be identified and how their effects on yield can be interpreted using the proposed modelling framework.</p></div>","PeriodicalId":51045,"journal":{"name":"European Journal of Agronomy","volume":null,"pages":null},"PeriodicalIF":4.5000,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1161030124001758/pdfft?md5=5a60031477f201b04ffc6e1078797ed8&pid=1-s2.0-S1161030124001758-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Agronomy","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1161030124001758","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0
Abstract
Assessing the response of crop yield to year-to-year climate variability at the field scale is often done using process-based models and regression techniques. Although powerful, these tools rely on strong assumptions and can lead to substantial prediction errors. In this study, we investigate the use of a flexible machine learning algorithm combining Functional Principal Component Analysis and Random Forest, to relate field scale wheat yield to local daily climate variables. Instead of computing seasonal, monthly or any other arbitrary time-frame climate averages, climate time series are decomposed by Functional Principal Component Analysis into a few data-driven basis functions, called Principal Curves, in order to summarize the dynamic of key climate variables by a limited number of interpretable components. Scores associated to these components are then used as inputs of a Random Forest algorithm for yield prediction and for analysing important factors responsible for yield variability. To evaluate our approach, we use a French national database including wheat yield data as well as climate and management practice data for 298 farm fields from 2011 to 2016 in four main producing regions. Depending on the regions, our approach can explain from 62 % to 81 % of the yield variability when both agronomic and climate variables are included, down to 56–81 % when ignoring agronomic variables and 51–74 % when ignoring climate variables. Based on a year-by-year cross-validation, RMSE ranges from 0.5 t ha−1 to 2.1 t ha−1 in non-extreme years (2012–2015). However, prediction error can reach 3.6 t ha−1 in case of exceptional weather conditions, such as those experienced in 2016 in Northern France. We find that this new approach performs in most cases better than the same machine learning algorithm using the usual time averages of climate variables, without the need to choose an arbitrary time-frame. We then show how important patterns in weather time series can be identified and how their effects on yield can be interpreted using the proposed modelling framework.
期刊介绍:
The European Journal of Agronomy, the official journal of the European Society for Agronomy, publishes original research papers reporting experimental and theoretical contributions to field-based agronomy and crop science. The journal will consider research at the field level for agricultural, horticultural and tree crops, that uses comprehensive and explanatory approaches. The EJA covers the following topics:
crop physiology
crop production and management including irrigation, fertilization and soil management
agroclimatology and modelling
plant-soil relationships
crop quality and post-harvest physiology
farming and cropping systems
agroecosystems and the environment
crop-weed interactions and management
organic farming
horticultural crops
papers from the European Society for Agronomy bi-annual meetings
In determining the suitability of submitted articles for publication, particular scrutiny is placed on the degree of novelty and significance of the research and the extent to which it adds to existing knowledge in agronomy.