{"title":"农业研究中评估模型性能的常见缺陷和避免策略","authors":"C.P. James Chen, Robin R. White, Ryan Wright","doi":"10.1016/j.compag.2025.110126","DOIUrl":null,"url":null,"abstract":"<div><div>Predictive modeling is a cornerstone of data-driven research and decision-making in precision agriculture, yet achieving robust, interpretable, and reproducible model evaluations remains challenging. This study addresses two central issues in model evaluation – methodological pitfalls in cross-validation (CV) and data-structure effects on performance metrics – across five simulation experiments supplemented by real-world data. First, we show how the choice of estimator (e.g., 2-fold, 5-fold, or leave-one-out CV) and sample size affects the reliability of performance estimates: although leave-one-out CV can be unbiased for error-based metrics, it systematically underestimates correlation-based metrics. Second, we demonstrate that reusing the test data during model selection (e.g., feature selection, hyperparameter tuning) inflates performance estimates, reinforcing the need for proper separation of training, validation, and test sets. Third, we reveal how ignoring experimental block effects, such as seasonal or herd variations, introduces an upward bias in performance measures highlighting the importance of block CV when predictions are intended for new, previously unseen environment. Fourth, we highlight that different regression metrics – ranging from correlation-based to error-based (e.g., root mean squared error) – capture distinct aspects of predictive performance an under varying error biases and variances. Finally, for classification tasks, class imbalance and threshold settings significantly alter performance metrics, illustrating why a single metric rarely suffices to characterize model performance comprehensively. Collectively, these findings emphasize the need for careful alignment between modeling objectives, CV strategies, and metric selection, thereby ensuring trustworthy and generalizable model assessments in precision agriculture and beyond.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"234 ","pages":"Article 110126"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Common pitfalls in evaluating model performance and strategies for avoidance in agricultural studies\",\"authors\":\"C.P. James Chen, Robin R. White, Ryan Wright\",\"doi\":\"10.1016/j.compag.2025.110126\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Predictive modeling is a cornerstone of data-driven research and decision-making in precision agriculture, yet achieving robust, interpretable, and reproducible model evaluations remains challenging. This study addresses two central issues in model evaluation – methodological pitfalls in cross-validation (CV) and data-structure effects on performance metrics – across five simulation experiments supplemented by real-world data. First, we show how the choice of estimator (e.g., 2-fold, 5-fold, or leave-one-out CV) and sample size affects the reliability of performance estimates: although leave-one-out CV can be unbiased for error-based metrics, it systematically underestimates correlation-based metrics. Second, we demonstrate that reusing the test data during model selection (e.g., feature selection, hyperparameter tuning) inflates performance estimates, reinforcing the need for proper separation of training, validation, and test sets. Third, we reveal how ignoring experimental block effects, such as seasonal or herd variations, introduces an upward bias in performance measures highlighting the importance of block CV when predictions are intended for new, previously unseen environment. Fourth, we highlight that different regression metrics – ranging from correlation-based to error-based (e.g., root mean squared error) – capture distinct aspects of predictive performance an under varying error biases and variances. Finally, for classification tasks, class imbalance and threshold settings significantly alter performance metrics, illustrating why a single metric rarely suffices to characterize model performance comprehensively. Collectively, these findings emphasize the need for careful alignment between modeling objectives, CV strategies, and metric selection, thereby ensuring trustworthy and generalizable model assessments in precision agriculture and beyond.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"234 \",\"pages\":\"Article 110126\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169925002327\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925002327","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
Common pitfalls in evaluating model performance and strategies for avoidance in agricultural studies
Predictive modeling is a cornerstone of data-driven research and decision-making in precision agriculture, yet achieving robust, interpretable, and reproducible model evaluations remains challenging. This study addresses two central issues in model evaluation – methodological pitfalls in cross-validation (CV) and data-structure effects on performance metrics – across five simulation experiments supplemented by real-world data. First, we show how the choice of estimator (e.g., 2-fold, 5-fold, or leave-one-out CV) and sample size affects the reliability of performance estimates: although leave-one-out CV can be unbiased for error-based metrics, it systematically underestimates correlation-based metrics. Second, we demonstrate that reusing the test data during model selection (e.g., feature selection, hyperparameter tuning) inflates performance estimates, reinforcing the need for proper separation of training, validation, and test sets. Third, we reveal how ignoring experimental block effects, such as seasonal or herd variations, introduces an upward bias in performance measures highlighting the importance of block CV when predictions are intended for new, previously unseen environment. Fourth, we highlight that different regression metrics – ranging from correlation-based to error-based (e.g., root mean squared error) – capture distinct aspects of predictive performance an under varying error biases and variances. Finally, for classification tasks, class imbalance and threshold settings significantly alter performance metrics, illustrating why a single metric rarely suffices to characterize model performance comprehensively. Collectively, these findings emphasize the need for careful alignment between modeling objectives, CV strategies, and metric selection, thereby ensuring trustworthy and generalizable model assessments in precision agriculture and beyond.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.