Rafaella Pironato Amaro , Pierre Todoroff , Mathias Christina , Daniel Garbellini Duft , Ana Cláudia dos Santos Luciano
{"title":"基于Sentinel-2图像、农艺和气候数据的甘蔗产量估算性能评估","authors":"Rafaella Pironato Amaro , Pierre Todoroff , Mathias Christina , Daniel Garbellini Duft , Ana Cláudia dos Santos Luciano","doi":"10.1016/j.compag.2025.110522","DOIUrl":null,"url":null,"abstract":"<div><div>Given the importance of the sugarcane sector, machine learning techniques are being used as an important tool to improve yield estimation. This study aims to select the most relevant predictors from Sentinel-2 imagery, agronomic, and climatic data, using the Random Forest algorithm (RF), to estimate sugarcane yield before the harvest in a mill in the west of São Paulo state. We used radiometric bands (<span><math><msub><mtext>Red-edge</mtext><mn>1</mn></msub></math></span> to <span><math><msub><mtext>Red-edge</mtext><mn>3</mn></msub></math></span>, Red, NIR, <span><math><msub><mtext>SWIR</mtext><mn>1</mn></msub></math></span>, and <span><math><msub><mtext>SWIR</mtext><mn>2</mn></msub></math></span>) and vegetation indices from Sentinel-2 multispectral reflectance data (<span><math><msub><mtext>NDVIRE</mtext><mn>1</mn></msub></math></span> to <span><math><msub><mtext>NDVIRE</mtext><mn>3</mn></msub></math></span>, EVI, <span><math><msub><mtext>CIRE</mtext><mn>1</mn></msub></math></span> to <span><math><msub><mtext>CIRE</mtext><mn>3</mn></msub></math></span>, NDVI, <span><math><msub><mrow><mtext>ND</mtext><mtext>WI</mtext></mrow><mn>1</mn></msub></math></span>, <span><math><msub><mrow><mtext>ND</mtext><mtext>WI</mtext></mrow><mn>2</mn></msub></math></span>, SIWSI, NDMI, SAVI); agronomic data (soil type, number of harvests, variety, slope); climatic and agroclimatic data (temperature, precipitation, radiation, and crop water balance). We built four datasets to create yield estimation models for the mill: (i) the first dataset included all variables; (ii) in the second dataset, the strongly correlated variables from the dataset (i) were removed; (iii) the third dataset included the variables identified by feature selection within the 2nd dataset using RF algorithm’s impurity index (best model results); (iv) the fourth dataset, consisting of the 20 highest ranked variables from dataset 1 selected by SHapley Additive exPlanations (SHAP). The models showed R<sup>2</sup> values ranging from 0.58 to 0.70 with dataset 3, and the d-Willmott index ranged from 0.83 to 0.89. The most relevant variables for estimating sugarcane yield were the number of harvests, climatic data and vegetation indices that used Red-edge, near-infrared narrow, red and SWIR bands.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"237 ","pages":"Article 110522"},"PeriodicalIF":7.7000,"publicationDate":"2025-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Performance evaluation of Sentinel-2 imagery, agronomic and climatic data for sugarcane yield estimation\",\"authors\":\"Rafaella Pironato Amaro , Pierre Todoroff , Mathias Christina , Daniel Garbellini Duft , Ana Cláudia dos Santos Luciano\",\"doi\":\"10.1016/j.compag.2025.110522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Given the importance of the sugarcane sector, machine learning techniques are being used as an important tool to improve yield estimation. This study aims to select the most relevant predictors from Sentinel-2 imagery, agronomic, and climatic data, using the Random Forest algorithm (RF), to estimate sugarcane yield before the harvest in a mill in the west of São Paulo state. We used radiometric bands (<span><math><msub><mtext>Red-edge</mtext><mn>1</mn></msub></math></span> to <span><math><msub><mtext>Red-edge</mtext><mn>3</mn></msub></math></span>, Red, NIR, <span><math><msub><mtext>SWIR</mtext><mn>1</mn></msub></math></span>, and <span><math><msub><mtext>SWIR</mtext><mn>2</mn></msub></math></span>) and vegetation indices from Sentinel-2 multispectral reflectance data (<span><math><msub><mtext>NDVIRE</mtext><mn>1</mn></msub></math></span> to <span><math><msub><mtext>NDVIRE</mtext><mn>3</mn></msub></math></span>, EVI, <span><math><msub><mtext>CIRE</mtext><mn>1</mn></msub></math></span> to <span><math><msub><mtext>CIRE</mtext><mn>3</mn></msub></math></span>, NDVI, <span><math><msub><mrow><mtext>ND</mtext><mtext>WI</mtext></mrow><mn>1</mn></msub></math></span>, <span><math><msub><mrow><mtext>ND</mtext><mtext>WI</mtext></mrow><mn>2</mn></msub></math></span>, SIWSI, NDMI, SAVI); agronomic data (soil type, number of harvests, variety, slope); climatic and agroclimatic data (temperature, precipitation, radiation, and crop water balance). We built four datasets to create yield estimation models for the mill: (i) the first dataset included all variables; (ii) in the second dataset, the strongly correlated variables from the dataset (i) were removed; (iii) the third dataset included the variables identified by feature selection within the 2nd dataset using RF algorithm’s impurity index (best model results); (iv) the fourth dataset, consisting of the 20 highest ranked variables from dataset 1 selected by SHapley Additive exPlanations (SHAP). The models showed R<sup>2</sup> values ranging from 0.58 to 0.70 with dataset 3, and the d-Willmott index ranged from 0.83 to 0.89. The most relevant variables for estimating sugarcane yield were the number of harvests, climatic data and vegetation indices that used Red-edge, near-infrared narrow, red and SWIR bands.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"237 \",\"pages\":\"Article 110522\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2025-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169925006283\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925006283","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
Performance evaluation of Sentinel-2 imagery, agronomic and climatic data for sugarcane yield estimation
Given the importance of the sugarcane sector, machine learning techniques are being used as an important tool to improve yield estimation. This study aims to select the most relevant predictors from Sentinel-2 imagery, agronomic, and climatic data, using the Random Forest algorithm (RF), to estimate sugarcane yield before the harvest in a mill in the west of São Paulo state. We used radiometric bands ( to , Red, NIR, , and ) and vegetation indices from Sentinel-2 multispectral reflectance data ( to , EVI, to , NDVI, , , SIWSI, NDMI, SAVI); agronomic data (soil type, number of harvests, variety, slope); climatic and agroclimatic data (temperature, precipitation, radiation, and crop water balance). We built four datasets to create yield estimation models for the mill: (i) the first dataset included all variables; (ii) in the second dataset, the strongly correlated variables from the dataset (i) were removed; (iii) the third dataset included the variables identified by feature selection within the 2nd dataset using RF algorithm’s impurity index (best model results); (iv) the fourth dataset, consisting of the 20 highest ranked variables from dataset 1 selected by SHapley Additive exPlanations (SHAP). The models showed R2 values ranging from 0.58 to 0.70 with dataset 3, and the d-Willmott index ranged from 0.83 to 0.89. The most relevant variables for estimating sugarcane yield were the number of harvests, climatic data and vegetation indices that used Red-edge, near-infrared narrow, red and SWIR bands.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.