Gerardo Acevedo-Sánchez, Antonio Alarcón-Paredes, Cornelio Yáñez-Márquez
{"title":"Effect of agriculture-related dataset complexity on classical machine learning and deep learning classifiers performance","authors":"Gerardo Acevedo-Sánchez, Antonio Alarcón-Paredes, Cornelio Yáñez-Márquez","doi":"10.1016/j.compag.2025.110941","DOIUrl":null,"url":null,"abstract":"<div><div>This study evaluates how five indicators of dataset complexity affect the performance of 24 machine learning (ML) and deep learning (DL) classifiers across eight publicly available agriculture-related datasets. The indicators were cardinality (320–13,611 instances), dimensionality (7–35 features), class imbalance (Imbalance Ratio [IR] = 1–109.9), class number (2–40 classes), and feature types (numeric and ordinal). Performance measures, including sensitivity, specificity, balanced accuracy (BA), precision, F1-score, and Matthews Correlation Coefficient (MCC), were derived from confusion matrices generated via 10-fold cross-validation procedure. Macro and weighted-average were included as overall measures. Nonparametric tests (Friedman-Nemenyi; <em>p</em> < 0.05 and Cliff’s [δ]) were performed for weighted-average sensitivity and BA. Across 192 analyses, ensembles (GBM, XGBoost, RF) and C5.0 significantly outperformed other classifiers on 5 out of 8 datasets, achieving values greater than 0.91. Artificial Neural Networks (ANN) showed ineffectiveness for tabular data (BA ≤ 0.50). Extreme imbalance (White Wine: IR = 109.9) affected the classifiers performance, mainly for distance-based and probabilistic (MCC ≤ 0.34), even the ensembles partially mitigated the bias (BA ≤ 0.65). High dimensionality (Date Fruits: 34 features) favored LDA and RF (BA ≥ 0.93). Conversely, large multiclass (Soybean Cultivars: 40 classes) demonstrated higher performance of IBk (BA = 0.87). Sixty paired comparisons confirmed significant differences (<em>p</em> < 0.00001) and strong effects (δ = -0.57 to 0.18) between ensembles and underperforming classifiers, confirming that dimensionality, IR, and multiclass directly determine the performance. To the best of our knowledge, this is the first large-scale comparison of 24 ML/DL classifiers on eight agricultural datasets.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"239 ","pages":"Article 110941"},"PeriodicalIF":8.9000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925010476","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
This study evaluates how five indicators of dataset complexity affect the performance of 24 machine learning (ML) and deep learning (DL) classifiers across eight publicly available agriculture-related datasets. The indicators were cardinality (320–13,611 instances), dimensionality (7–35 features), class imbalance (Imbalance Ratio [IR] = 1–109.9), class number (2–40 classes), and feature types (numeric and ordinal). Performance measures, including sensitivity, specificity, balanced accuracy (BA), precision, F1-score, and Matthews Correlation Coefficient (MCC), were derived from confusion matrices generated via 10-fold cross-validation procedure. Macro and weighted-average were included as overall measures. Nonparametric tests (Friedman-Nemenyi; p < 0.05 and Cliff’s [δ]) were performed for weighted-average sensitivity and BA. Across 192 analyses, ensembles (GBM, XGBoost, RF) and C5.0 significantly outperformed other classifiers on 5 out of 8 datasets, achieving values greater than 0.91. Artificial Neural Networks (ANN) showed ineffectiveness for tabular data (BA ≤ 0.50). Extreme imbalance (White Wine: IR = 109.9) affected the classifiers performance, mainly for distance-based and probabilistic (MCC ≤ 0.34), even the ensembles partially mitigated the bias (BA ≤ 0.65). High dimensionality (Date Fruits: 34 features) favored LDA and RF (BA ≥ 0.93). Conversely, large multiclass (Soybean Cultivars: 40 classes) demonstrated higher performance of IBk (BA = 0.87). Sixty paired comparisons confirmed significant differences (p < 0.00001) and strong effects (δ = -0.57 to 0.18) between ensembles and underperforming classifiers, confirming that dimensionality, IR, and multiclass directly determine the performance. To the best of our knowledge, this is the first large-scale comparison of 24 ML/DL classifiers on eight agricultural datasets.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.