Hamid Kamangir , Brent S. Sams , Nick Dokoozlian , Luis Sanchez , J. Mason Earles
{"title":"Predicting crop yield lows through the highs via binned deep imbalanced regression: A case study on vineyards","authors":"Hamid Kamangir , Brent S. Sams , Nick Dokoozlian , Luis Sanchez , J. Mason Earles","doi":"10.1016/j.jag.2025.104536","DOIUrl":null,"url":null,"abstract":"<div><div>Crop yield estimation is vital for agricultural management but often struggles with predicting extreme values that can significantly impact operations and markets. Traditional models face challenges with these extremes, leading to biased and inaccurate predictions. To address this challenge, our study introduces two innovative strategies. First, we propose a cost-sensitive loss function, ExtremeLoss, designed to better capture and represent less frequent yield values by giving greater importance to extreme cases during training. Second, we develop a conditional deep learning model that enhances feature representation by conditioning on a binned yield observation map. This approach encourages smoother and more coherent input feature maps across different segments of the yield value range by leveraging similarities within and across yield bins, ultimately improving the model’s ability to generalize and distinguish between subtle variations in yield. This approach creates ”yield zone maps,” grouping yields into classes (e.g., low extreme, common, high extreme) to improve the identification of yield variability, which can be removed during inference. Our model was tested on a comprehensive grape yield dataset from 2016 to 2019, covering 2,200 hectares and 42 blocks of eight cultivars. We compared its performance against advanced techniques such as Focal-R loss, label distribution smoothing, dense weighting, and class-balanced methods under two validation scenarios: block-hold-out (BHO) and year-block-hold-out (YBHO). Our approach outperforms existing models in R-squared <span><math><mrow><mo>(</mo><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></math></span>, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). Notably, it reduces MAE by +2.98 and +14.45 (t/ha) for low and high extremes in the BHO scenario and by +7.18 and +11.05 (t/ha) in the YBHO scenario. It also significantly decreases MAPE by +19.09% and +23.94% in the BHO scenario and by +33.76% and +19.61% in the YBHO scenario. Our model shows a marked improvement in capturing spatial variability and significantly advances spatio-temporal yield estimation, particularly for extreme values in complex agricultural settings like vineyards.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"139 ","pages":"Article 104536"},"PeriodicalIF":7.6000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843225001839","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0
Abstract
Crop yield estimation is vital for agricultural management but often struggles with predicting extreme values that can significantly impact operations and markets. Traditional models face challenges with these extremes, leading to biased and inaccurate predictions. To address this challenge, our study introduces two innovative strategies. First, we propose a cost-sensitive loss function, ExtremeLoss, designed to better capture and represent less frequent yield values by giving greater importance to extreme cases during training. Second, we develop a conditional deep learning model that enhances feature representation by conditioning on a binned yield observation map. This approach encourages smoother and more coherent input feature maps across different segments of the yield value range by leveraging similarities within and across yield bins, ultimately improving the model’s ability to generalize and distinguish between subtle variations in yield. This approach creates ”yield zone maps,” grouping yields into classes (e.g., low extreme, common, high extreme) to improve the identification of yield variability, which can be removed during inference. Our model was tested on a comprehensive grape yield dataset from 2016 to 2019, covering 2,200 hectares and 42 blocks of eight cultivars. We compared its performance against advanced techniques such as Focal-R loss, label distribution smoothing, dense weighting, and class-balanced methods under two validation scenarios: block-hold-out (BHO) and year-block-hold-out (YBHO). Our approach outperforms existing models in R-squared , Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). Notably, it reduces MAE by +2.98 and +14.45 (t/ha) for low and high extremes in the BHO scenario and by +7.18 and +11.05 (t/ha) in the YBHO scenario. It also significantly decreases MAPE by +19.09% and +23.94% in the BHO scenario and by +33.76% and +19.61% in the YBHO scenario. Our model shows a marked improvement in capturing spatial variability and significantly advances spatio-temporal yield estimation, particularly for extreme values in complex agricultural settings like vineyards.
期刊介绍:
The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.