{"title":"Optimizing tomato yield prediction using phenologically timed UAV-based spectral data and machine learning","authors":"Carolina Trentin , Yiannis Ampatzidis , Sotirios Tasioulas , Pavlos Tsouvaltzis","doi":"10.1016/j.atech.2025.101158","DOIUrl":null,"url":null,"abstract":"<div><div>Accurate yield prediction is critical for optimizing agricultural practices and ensuring food security. This study evaluated the performance of machine learning models in predicting tomato yield using weather data, spectral bands, and vegetation indices under varying nitrogen rates and bio-stimulant treatments to induce plant growth variability. UAV-based spectral data were collected across seven dates from October 27 to December 15, 2023, corresponding to key phenological stages: vegetative growth (data collection date 1), flowering (dates 2 and 3), fruit development (dates 4, 5, and 6), and early ripening (date 7). Significant input features were identified using the Pearson correlation coefficient (<em>r</em> > 0.65, <em>p</em> < 0.05), including Near Infrared (NIR), Red Edge, and Red spectral bands, as well as vegetation indices such as NDVI, GNDVI, NDRE, and SAVI. Aerial spectral data collected during fruit development (dates 5 and 6) showed the strongest correlations with yield (<em>r</em> = 0.66–0.74), emphasizing the importance of mid-to-late-season spectral information. Among the models evaluated, linear regression (LR) and XGBoost achieved the best performance, with root mean squared error (RMSE) values of 16.13 kg and 16.15 kg, respectively, and R² values of 0.63. Support vector machine (SVM) and decision tree (DT) also perform well, with RMSE values of 17.15 kg and 17.18 kg, respectively. In contrast, the deep learning model underperformed (RMSE = 23.49 kg, R² = 0.23), likely due to the limited data. This study highlights the predictive potential of spectral bands and emphasizes the significance of phenologically timed spectral data for yield estimation.</div></div>","PeriodicalId":74813,"journal":{"name":"Smart agricultural technology","volume":"12 ","pages":"Article 101158"},"PeriodicalIF":5.7000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart agricultural technology","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772375525003909","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate yield prediction is critical for optimizing agricultural practices and ensuring food security. This study evaluated the performance of machine learning models in predicting tomato yield using weather data, spectral bands, and vegetation indices under varying nitrogen rates and bio-stimulant treatments to induce plant growth variability. UAV-based spectral data were collected across seven dates from October 27 to December 15, 2023, corresponding to key phenological stages: vegetative growth (data collection date 1), flowering (dates 2 and 3), fruit development (dates 4, 5, and 6), and early ripening (date 7). Significant input features were identified using the Pearson correlation coefficient (r > 0.65, p < 0.05), including Near Infrared (NIR), Red Edge, and Red spectral bands, as well as vegetation indices such as NDVI, GNDVI, NDRE, and SAVI. Aerial spectral data collected during fruit development (dates 5 and 6) showed the strongest correlations with yield (r = 0.66–0.74), emphasizing the importance of mid-to-late-season spectral information. Among the models evaluated, linear regression (LR) and XGBoost achieved the best performance, with root mean squared error (RMSE) values of 16.13 kg and 16.15 kg, respectively, and R² values of 0.63. Support vector machine (SVM) and decision tree (DT) also perform well, with RMSE values of 17.15 kg and 17.18 kg, respectively. In contrast, the deep learning model underperformed (RMSE = 23.49 kg, R² = 0.23), likely due to the limited data. This study highlights the predictive potential of spectral bands and emphasizes the significance of phenologically timed spectral data for yield estimation.