农业研究中评估模型性能的常见缺陷和避免策略

IF 8.9 1区农林科学 Q1 AGRICULTURE, MULTIDISCIPLINARY

Computers and Electronics in Agriculture Pub Date : 2025-03-06 DOI:10.1016/j.compag.2025.110126

C.P. James Chen, Robin R. White, Ryan Wright

{"title":"农业研究中评估模型性能的常见缺陷和避免策略","authors":"C.P. James Chen, Robin R. White, Ryan Wright","doi":"10.1016/j.compag.2025.110126","DOIUrl":null,"url":null,"abstract":"<div><div>Predictive modeling is a cornerstone of data-driven research and decision-making in precision agriculture, yet achieving robust, interpretable, and reproducible model evaluations remains challenging. This study addresses two central issues in model evaluation – methodological pitfalls in cross-validation (CV) and data-structure effects on performance metrics – across five simulation experiments supplemented by real-world data. First, we show how the choice of estimator (e.g., 2-fold, 5-fold, or leave-one-out CV) and sample size affects the reliability of performance estimates: although leave-one-out CV can be unbiased for error-based metrics, it systematically underestimates correlation-based metrics. Second, we demonstrate that reusing the test data during model selection (e.g., feature selection, hyperparameter tuning) inflates performance estimates, reinforcing the need for proper separation of training, validation, and test sets. Third, we reveal how ignoring experimental block effects, such as seasonal or herd variations, introduces an upward bias in performance measures highlighting the importance of block CV when predictions are intended for new, previously unseen environment. Fourth, we highlight that different regression metrics – ranging from correlation-based to error-based (e.g., root mean squared error) – capture distinct aspects of predictive performance an under varying error biases and variances. Finally, for classification tasks, class imbalance and threshold settings significantly alter performance metrics, illustrating why a single metric rarely suffices to characterize model performance comprehensively. Collectively, these findings emphasize the need for careful alignment between modeling objectives, CV strategies, and metric selection, thereby ensuring trustworthy and generalizable model assessments in precision agriculture and beyond.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"234 ","pages":"Article 110126"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Common pitfalls in evaluating model performance and strategies for avoidance in agricultural studies\",\"authors\":\"C.P. James Chen, Robin R. White, Ryan Wright\",\"doi\":\"10.1016/j.compag.2025.110126\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Predictive modeling is a cornerstone of data-driven research and decision-making in precision agriculture, yet achieving robust, interpretable, and reproducible model evaluations remains challenging. This study addresses two central issues in model evaluation – methodological pitfalls in cross-validation (CV) and data-structure effects on performance metrics – across five simulation experiments supplemented by real-world data. First, we show how the choice of estimator (e.g., 2-fold, 5-fold, or leave-one-out CV) and sample size affects the reliability of performance estimates: although leave-one-out CV can be unbiased for error-based metrics, it systematically underestimates correlation-based metrics. Second, we demonstrate that reusing the test data during model selection (e.g., feature selection, hyperparameter tuning) inflates performance estimates, reinforcing the need for proper separation of training, validation, and test sets. Third, we reveal how ignoring experimental block effects, such as seasonal or herd variations, introduces an upward bias in performance measures highlighting the importance of block CV when predictions are intended for new, previously unseen environment. Fourth, we highlight that different regression metrics – ranging from correlation-based to error-based (e.g., root mean squared error) – capture distinct aspects of predictive performance an under varying error biases and variances. Finally, for classification tasks, class imbalance and threshold settings significantly alter performance metrics, illustrating why a single metric rarely suffices to characterize model performance comprehensively. Collectively, these findings emphasize the need for careful alignment between modeling objectives, CV strategies, and metric selection, thereby ensuring trustworthy and generalizable model assessments in precision agriculture and beyond.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"234 \",\"pages\":\"Article 110126\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0168169925002327\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925002327","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

预测建模是精准农业中数据驱动研究和决策的基石，但实现稳健、可解释和可重复的模型评估仍然具有挑战性。本研究解决了模型评估中的两个核心问题-交叉验证（CV）的方法缺陷和数据结构对性能指标的影响-通过五个模拟实验补充了真实世界的数据。首先，我们展示了估计量的选择（例如，2倍、5倍或留一CV）和样本量如何影响性能估计的可靠性：尽管留一CV对于基于误差的指标可以是无偏的，但它系统地低估了基于相关性的指标。其次，我们证明了在模型选择（例如，特征选择，超参数调优）期间重用测试数据会增加性能估计，从而加强了对训练、验证和测试集的适当分离的需要。第三，我们揭示了忽略实验块效应（如季节或群体变化）如何在性能测量中引入向上偏差，当预测用于新的，以前未见过的环境时，突出块CV的重要性。第四，我们强调不同的回归指标——从基于相关性的到基于误差的（例如，均方根误差）——在不同的误差偏差和方差下捕获预测性能的不同方面。最后，对于分类任务，类不平衡和阈值设置会显著改变性能指标，这说明了为什么单个指标很少足以全面表征模型性能。总的来说，这些发现强调了在建模目标、CV策略和度量选择之间进行仔细校准的必要性，从而确保在精准农业及其他领域进行可信和可推广的模型评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Common pitfalls in evaluating model performance and strategies for avoidance in agricultural studies

Predictive modeling is a cornerstone of data-driven research and decision-making in precision agriculture, yet achieving robust, interpretable, and reproducible model evaluations remains challenging. This study addresses two central issues in model evaluation – methodological pitfalls in cross-validation (CV) and data-structure effects on performance metrics – across five simulation experiments supplemented by real-world data. First, we show how the choice of estimator (e.g., 2-fold, 5-fold, or leave-one-out CV) and sample size affects the reliability of performance estimates: although leave-one-out CV can be unbiased for error-based metrics, it systematically underestimates correlation-based metrics. Second, we demonstrate that reusing the test data during model selection (e.g., feature selection, hyperparameter tuning) inflates performance estimates, reinforcing the need for proper separation of training, validation, and test sets. Third, we reveal how ignoring experimental block effects, such as seasonal or herd variations, introduces an upward bias in performance measures highlighting the importance of block CV when predictions are intended for new, previously unseen environment. Fourth, we highlight that different regression metrics – ranging from correlation-based to error-based (e.g., root mean squared error) – capture distinct aspects of predictive performance an under varying error biases and variances. Finally, for classification tasks, class imbalance and threshold settings significantly alter performance metrics, illustrating why a single metric rarely suffices to characterize model performance comprehensively. Collectively, these findings emphasize the need for careful alignment between modeling objectives, CV strategies, and metric selection, thereby ensuring trustworthy and generalizable model assessments in precision agriculture and beyond.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers and Electronics in Agriculture 工程技术-计算机：跨学科应用

CiteScore

15.30

自引率

14.50%

发文量

800

审稿时长

62 days

期刊介绍： Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.