Haiyan Zhang, Yao Zhang, Xiu-hua Li, Minzan Li, Zezhong Tian
{"title":"Predicting Banana Yield at the Field Scale by Combining Sentinel-2 Time Series Data and Regression Models","authors":"Haiyan Zhang, Yao Zhang, Xiu-hua Li, Minzan Li, Zezhong Tian","doi":"10.13031/aea.15220","DOIUrl":null,"url":null,"abstract":"Highlights A dataset expansion method based on random sampling could improve the robustness of yield estimation models. CIRE was more suitable for banana yield estimation. XGBoost-based banana yield estimation method showed good prediction ability of banana yield. Abstract. Banana yield prediction at the field level offers significant benefits to growers, packinghouses, crop insurance companies, and researchers. This study explored a remote sensing-based approach for forecasting banana yield at the field scale by using Sentinel-2 (S2) image time series and regression models. First, S2 images of critical phenological periods for bananas were acquired from the Google Earth Engine platform, and these images were treated with cloud and cloud shadow removal. Second, the dataset was expanded by randomly selecting pixels for each field to improve the accuracy of yield prediction. Third, nine vegetation indices (VIs) with high correlation with crop yield were compared and analyzed. Chlorophyll Index Red Edge was selected with a particularly high predictive ability in banana yield prediction. Finally, six regression models, namely, least absolute shrinkage and selection operator (LASSO), support vector regression (SVR), k-nearest neighbors (k-NN), random forest (RF), gradient boosted regression trees (GBRT), and extreme gradient boost (XGBoost), were employed, and their performances were compared. Results showed that the best prediction of banana yield was when 70 pixels were selected for each banana field. Out of nine VIs, comparing different regression models, the XGBoost model emerged as the best learner (the average of R2 for 100 runs in 2019 and 2020 were 0.84 and 0.79, respectively). It was followed by the GBRT model with almost the same performance, which explained 82% and 79% of the banana yield variability for 2019 and 2020, respectively. The LASSO model exhibited the lowest performance of all, but it performed best in terms of stability. The proposed framework applied to satellite image time series can achieve reliable banana yield prediction across years at the field scale. Keywords: Banana yield prediction, Extreme gradient boost, Sentinel-2, Vegetation index.","PeriodicalId":55501,"journal":{"name":"Applied Engineering in Agriculture","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Engineering in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.13031/aea.15220","RegionNum":4,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 1
Abstract
Highlights A dataset expansion method based on random sampling could improve the robustness of yield estimation models. CIRE was more suitable for banana yield estimation. XGBoost-based banana yield estimation method showed good prediction ability of banana yield. Abstract. Banana yield prediction at the field level offers significant benefits to growers, packinghouses, crop insurance companies, and researchers. This study explored a remote sensing-based approach for forecasting banana yield at the field scale by using Sentinel-2 (S2) image time series and regression models. First, S2 images of critical phenological periods for bananas were acquired from the Google Earth Engine platform, and these images were treated with cloud and cloud shadow removal. Second, the dataset was expanded by randomly selecting pixels for each field to improve the accuracy of yield prediction. Third, nine vegetation indices (VIs) with high correlation with crop yield were compared and analyzed. Chlorophyll Index Red Edge was selected with a particularly high predictive ability in banana yield prediction. Finally, six regression models, namely, least absolute shrinkage and selection operator (LASSO), support vector regression (SVR), k-nearest neighbors (k-NN), random forest (RF), gradient boosted regression trees (GBRT), and extreme gradient boost (XGBoost), were employed, and their performances were compared. Results showed that the best prediction of banana yield was when 70 pixels were selected for each banana field. Out of nine VIs, comparing different regression models, the XGBoost model emerged as the best learner (the average of R2 for 100 runs in 2019 and 2020 were 0.84 and 0.79, respectively). It was followed by the GBRT model with almost the same performance, which explained 82% and 79% of the banana yield variability for 2019 and 2020, respectively. The LASSO model exhibited the lowest performance of all, but it performed best in terms of stability. The proposed framework applied to satellite image time series can achieve reliable banana yield prediction across years at the field scale. Keywords: Banana yield prediction, Extreme gradient boost, Sentinel-2, Vegetation index.
期刊介绍:
This peer-reviewed journal publishes applications of engineering and technology research that address agricultural, food, and biological systems problems. Submissions must include results of practical experiences, tests, or trials presented in a manner and style that will allow easy adaptation by others; results of reviews or studies of installations or applications with substantially new or significant information not readily available in other refereed publications; or a description of successful methods of techniques of education, outreach, or technology transfer.