On crop yield modelling, predicting, and forecasting and addressing the common issues in published studies

IF 5.4 2区农林科学 Q1 AGRICULTURE, MULTIDISCIPLINARY

Precision Agriculture Pub Date : 2024-12-07 DOI:10.1007/s11119-024-10212-2

Patrick Filippi, Si Yang Han, Thomas F.A. Bishop

{"title":"On crop yield modelling, predicting, and forecasting and addressing the common issues in published studies","authors":"Patrick Filippi, Si Yang Han, Thomas F.A. Bishop","doi":"10.1007/s11119-024-10212-2","DOIUrl":null,"url":null,"abstract":"<p>There has been a recent surge in the number of studies that aim to model crop yield using data-driven approaches. This has largely come about due to the increasing amounts of remote sensing (e.g. satellite imagery) and precision agriculture data available (e.g. high-resolution crop yield monitor data), as well as the abundance of machine learning modelling approaches. However, there are several common issues in published studies in the field of precision agriculture (PA) that must be addressed. This includes the terminology used in relation to crop yield modelling, predicting, forecasting, and interpolating, as well as the way that models are calibrated and validated. As a typical example, many studies will take a crop yield map or several plots within a field from a single season, build a model with satellite or Unmanned Aerial Vehicle (UAV) imagery, validate using data-splitting or some kind of cross-validation (e.g. k-fold), and say that it is a ‘prediction’ or ‘forecast’ of crop yield. However, this poses a problem as the approach is not testing the forecasting ability of the model, as it is built on the same season that it is then validating with, thus giving a substantial overestimation of the value for decision-making, such as an application of fertiliser in-season. This is an all-too-common flaw in the logic construct of many published studies. Moving forward, it is essential that clear definitions and guidelines for data-driven yield modelling and validation are outlined so that there is a greater connection between the goal of the study, and the actual study outputs/outcomes. To demonstrate this, the current study uses a case study dataset from a collection of large neighbouring farms in New South Wales, Australia. The dataset includes 160 yield maps of winter wheat (<i>Triticum aestivum</i>) covering 26,400 hectares over a 10-year period (2014–2023). Machine learning crop yield models are built at 30 m spatial resolution with a suite of predictor data layers that relate to crop yield. This includes datasets that represent soil variation, terrain, weather, and satellite imagery of the crop. Predictions are made at both the within-field (30 m), and field resolution. Crop yield predictions are useful for an array of applications, so four different experiments were set up to reflect different scenarios. This included Experiment 1: forecasting yield mid-season (e.g. for mid-season fertilisation), Experiment 2: forecasting yield late-season (e.g. for late-season logistics/forward selling), Experiment 3: predicting yield in a previous season for a field with no yield data in a season, and Experiment 4: predicting yield in a previous season for a field with some yield data (e.g. two combine harvesters, but only one was fitted with a yield monitor). This study showcases how different model calibration and validation approaches clearly impact prediction quality, and therefore how they should be interpreted in data-driven crop yield modelling studies. This is key for ensuring that the wealth of data-driven crop yield modelling studies not only contribute to the science, but also deliver actual value to growers, industry, and governments.</p>","PeriodicalId":20423,"journal":{"name":"Precision Agriculture","volume":"8 1","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2024-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Precision Agriculture","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1007/s11119-024-10212-2","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

There has been a recent surge in the number of studies that aim to model crop yield using data-driven approaches. This has largely come about due to the increasing amounts of remote sensing (e.g. satellite imagery) and precision agriculture data available (e.g. high-resolution crop yield monitor data), as well as the abundance of machine learning modelling approaches. However, there are several common issues in published studies in the field of precision agriculture (PA) that must be addressed. This includes the terminology used in relation to crop yield modelling, predicting, forecasting, and interpolating, as well as the way that models are calibrated and validated. As a typical example, many studies will take a crop yield map or several plots within a field from a single season, build a model with satellite or Unmanned Aerial Vehicle (UAV) imagery, validate using data-splitting or some kind of cross-validation (e.g. k-fold), and say that it is a ‘prediction’ or ‘forecast’ of crop yield. However, this poses a problem as the approach is not testing the forecasting ability of the model, as it is built on the same season that it is then validating with, thus giving a substantial overestimation of the value for decision-making, such as an application of fertiliser in-season. This is an all-too-common flaw in the logic construct of many published studies. Moving forward, it is essential that clear definitions and guidelines for data-driven yield modelling and validation are outlined so that there is a greater connection between the goal of the study, and the actual study outputs/outcomes. To demonstrate this, the current study uses a case study dataset from a collection of large neighbouring farms in New South Wales, Australia. The dataset includes 160 yield maps of winter wheat (Triticum aestivum) covering 26,400 hectares over a 10-year period (2014–2023). Machine learning crop yield models are built at 30 m spatial resolution with a suite of predictor data layers that relate to crop yield. This includes datasets that represent soil variation, terrain, weather, and satellite imagery of the crop. Predictions are made at both the within-field (30 m), and field resolution. Crop yield predictions are useful for an array of applications, so four different experiments were set up to reflect different scenarios. This included Experiment 1: forecasting yield mid-season (e.g. for mid-season fertilisation), Experiment 2: forecasting yield late-season (e.g. for late-season logistics/forward selling), Experiment 3: predicting yield in a previous season for a field with no yield data in a season, and Experiment 4: predicting yield in a previous season for a field with some yield data (e.g. two combine harvesters, but only one was fitted with a yield monitor). This study showcases how different model calibration and validation approaches clearly impact prediction quality, and therefore how they should be interpreted in data-driven crop yield modelling studies. This is key for ensuring that the wealth of data-driven crop yield modelling studies not only contribute to the science, but also deliver actual value to growers, industry, and governments.

查看原文本刊更多论文

关于作物产量建模、预测、预测和解决已发表研究中的常见问题

最近，旨在利用数据驱动方法模拟作物产量的研究数量激增。这在很大程度上是由于遥感（如卫星图像）和精确农业数据（如高分辨率作物产量监测数据）的增加，以及机器学习建模方法的丰富。然而，在精准农业（PA）领域发表的研究中有几个共同的问题必须解决。这包括与作物产量建模、预测、预测和插值相关的术语，以及模型校准和验证的方式。作为一个典型的例子，许多研究将从一个季节中获取作物产量图或田地内的几个地块，用卫星或无人机（UAV）图像建立模型，使用数据分割或某种交叉验证（例如k-fold）进行验证，并说这是对作物产量的“预测”或“预测”。然而，这带来了一个问题，因为该方法没有测试模型的预测能力，因为它建立在同一季节，然后进行验证，从而大大高估了决策的价值，例如应季施肥。这是许多已发表研究的逻辑结构中一个非常普遍的缺陷。展望未来，为数据驱动的产量建模和验证制定明确的定义和指导方针至关重要，以便在研究目标与实际研究产出/结果之间建立更大的联系。为了证明这一点，目前的研究使用了来自澳大利亚新南威尔士州邻近大型农场的案例研究数据集。该数据集包括160个冬小麦（Triticum aestivum）产量图，覆盖10年（2014-2023年）26400公顷。机器学习作物产量模型在30米的空间分辨率下建立，具有一套与作物产量相关的预测数据层。这包括代表土壤变化、地形、天气和作物卫星图像的数据集。在场内（30米）和场分辨率下进行预测。作物产量预测对一系列应用都很有用，因此建立了四个不同的实验来反映不同的情况。这包括实验1：预测季中产量（例如，季中施肥），实验2：预测季末产量（例如，季末物流/远期销售），实验3：预测一个季节没有产量数据的田地前一季节的产量，以及实验4：预测一个有一些产量数据的田地前一季节的产量（例如，两台联合收割机，但只有一台配备了产量监视器）。本研究展示了不同的模型校准和验证方法如何明显地影响预测质量，因此在数据驱动的作物产量建模研究中应该如何解释它们。这是确保丰富的数据驱动的作物产量模型研究不仅有助于科学，而且为种植者、行业和政府提供实际价值的关键。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Precision Agriculture 农林科学-农业综合

CiteScore

12.30

自引率

8.10%

发文量

103

审稿时长

>24 weeks

期刊介绍： Precision Agriculture promotes the most innovative results coming from the research in the field of precision agriculture. It provides an effective forum for disseminating original and fundamental research and experience in the rapidly advancing area of precision farming. There are many topics in the field of precision agriculture; therefore, the topics that are addressed include, but are not limited to: Natural Resources Variability: Soil and landscape variability, digital elevation models, soil mapping, geostatistics, geographic information systems, microclimate, weather forecasting, remote sensing, management units, scale, etc. Managing Variability: Sampling techniques, site-specific nutrient and crop protection chemical recommendation, crop quality, tillage, seed density, seed variety, yield mapping, remote sensing, record keeping systems, data interpretation and use, crops (corn, wheat, sugar beets, potatoes, peanut, cotton, vegetables, etc.), management scale, etc. Engineering Technology: Computers, positioning systems, DGPS, machinery, tillage, planting, nutrient and crop protection implements, manure, irrigation, fertigation, yield monitor and mapping, soil physical and chemical characteristic sensors, weed/pest mapping, etc. Profitability: MEY, net returns, BMPs, optimum recommendations, crop quality, technology cost, sustainability, social impacts, marketing, cooperatives, farm scale, crop type, etc. Environment: Nutrient, crop protection chemicals, sediments, leaching, runoff, practices, field, watershed, on/off farm, artificial drainage, ground water, surface water, etc. Technology Transfer: Skill needs, education, training, outreach, methods, surveys, agri-business, producers, distance education, Internet, simulations models, decision support systems, expert systems, on-farm experimentation, partnerships, quality of rural life, etc.