Forecasting crop yield through a data-driven framework of remote sensing and biophysical knowledge: A case study for wheat and maize in the Guanzhong Plain, China
Zhikai Cheng, Xiaobo Gu, Yuanling Zhang, Tongtong Zhao, Shikun Sun, Yadan Du, Huanjie Cai
{"title":"Forecasting crop yield through a data-driven framework of remote sensing and biophysical knowledge: A case study for wheat and maize in the Guanzhong Plain, China","authors":"Zhikai Cheng, Xiaobo Gu, Yuanling Zhang, Tongtong Zhao, Shikun Sun, Yadan Du, Huanjie Cai","doi":"10.1016/j.eja.2026.128038","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate early yield forecasts are essential for maximizing benefits and ensuring food security in the Guanzhong Plain, China. Process-based crop models are often constrained by uncertain input data, which limits their ability to forecast yield at the regional grid level (e.g., 1 km × 1 km). Statistical models, ignore the biophysical mechanisms underlying crop growth and development, and their performance is limited by the quantity and quality of available training data. Therefore, there is an urgent need for a more comprehensive and robust grid-level wheat and maize yield forecasting approach for the Guanzhong Plain. In this study, an interpretable data-driven framework was developed to forecast wheat and maize yields by combining remote sensing (solar-induced chlorophyll fluorescence, SIF; spectral indices, SIs) and biophysical knowledge (APSIM outputs and extreme climatic events) data. A Bayesian integration model (BIM) was trained on high-quality synthetic datasets (obtained by the synthetic minority oversampling technique for regression, SMOTER) to achieve accurate harvest-time yield forecasts at specific time windows. The results showed that the integration of multi-source data reduced the yield prediction error, with the overall normalized root mean square error (NRMSE) decreasing by 0.6 %–39.0 % compared to the single-source models. The data-driven model trained on the SMOTER -based synthetic dataset achieved the highest yield forecasting accuracy (wheat: NRMSE = 16.2 %; maize: NRMSE = 20.7 %). The SIF made the largest contribution to yield forecasts and showed strong interactions and synergies with other feature variables (e.g., aboveground biomass, drought, and low temperature stress), further enhancing model performance. Overall, the proposed data-driven framework demonstrates a promising way for improving grid-level yield forecasting and provides useful insights for the sustainable development of agricultural systems.</div></div>","PeriodicalId":51045,"journal":{"name":"European Journal of Agronomy","volume":"175 ","pages":"Article 128038"},"PeriodicalIF":5.5000,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Agronomy","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1161030126000572","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/12 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"AGRONOMY","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate early yield forecasts are essential for maximizing benefits and ensuring food security in the Guanzhong Plain, China. Process-based crop models are often constrained by uncertain input data, which limits their ability to forecast yield at the regional grid level (e.g., 1 km × 1 km). Statistical models, ignore the biophysical mechanisms underlying crop growth and development, and their performance is limited by the quantity and quality of available training data. Therefore, there is an urgent need for a more comprehensive and robust grid-level wheat and maize yield forecasting approach for the Guanzhong Plain. In this study, an interpretable data-driven framework was developed to forecast wheat and maize yields by combining remote sensing (solar-induced chlorophyll fluorescence, SIF; spectral indices, SIs) and biophysical knowledge (APSIM outputs and extreme climatic events) data. A Bayesian integration model (BIM) was trained on high-quality synthetic datasets (obtained by the synthetic minority oversampling technique for regression, SMOTER) to achieve accurate harvest-time yield forecasts at specific time windows. The results showed that the integration of multi-source data reduced the yield prediction error, with the overall normalized root mean square error (NRMSE) decreasing by 0.6 %–39.0 % compared to the single-source models. The data-driven model trained on the SMOTER -based synthetic dataset achieved the highest yield forecasting accuracy (wheat: NRMSE = 16.2 %; maize: NRMSE = 20.7 %). The SIF made the largest contribution to yield forecasts and showed strong interactions and synergies with other feature variables (e.g., aboveground biomass, drought, and low temperature stress), further enhancing model performance. Overall, the proposed data-driven framework demonstrates a promising way for improving grid-level yield forecasting and provides useful insights for the sustainable development of agricultural systems.
期刊介绍:
The European Journal of Agronomy, the official journal of the European Society for Agronomy, publishes original research papers reporting experimental and theoretical contributions to field-based agronomy and crop science. The journal will consider research at the field level for agricultural, horticultural and tree crops, that uses comprehensive and explanatory approaches. The EJA covers the following topics:
crop physiology
crop production and management including irrigation, fertilization and soil management
agroclimatology and modelling
plant-soil relationships
crop quality and post-harvest physiology
farming and cropping systems
agroecosystems and the environment
crop-weed interactions and management
organic farming
horticultural crops
papers from the European Society for Agronomy bi-annual meetings
In determining the suitability of submitted articles for publication, particular scrutiny is placed on the degree of novelty and significance of the research and the extent to which it adds to existing knowledge in agronomy.