Oscar R Espinoza Sandoval, Juan C Angeles-Hernandez, Agustín Corral-Luna, Felipe A Rodríguez-Almeida, Pablo Pinedo, Albert De Vries, Santiago A Utsumi, Einar Vargas-Bello-Pérez
{"title":"一种基于数据的奶牛场剔除决策过程的监督机器学习方法。","authors":"Oscar R Espinoza Sandoval, Juan C Angeles-Hernandez, Agustín Corral-Luna, Felipe A Rodríguez-Almeida, Pablo Pinedo, Albert De Vries, Santiago A Utsumi, Einar Vargas-Bello-Pérez","doi":"10.1017/S0022029925101416","DOIUrl":null,"url":null,"abstract":"<p><p>This research paper aimed to develop a supervised machine learning (ML) approach that learns and predicts data-based culling from farm information that reflects the criteria of the decisions taken to cull a cow by a farm manager. Data containing the features of milk yield, days in milk, lactation number, pregnancy status, days open and days pregnant were obtained from January to December 2020 from dairy cows on a large dairy farm in northern Mexico. The cows were labelled as those that were data-based culled (<i>Cull</i>) and those that were not culled (<i>Stay</i>). Six supervised ML algorithms were evaluated in a binary classification including logistic regression (LR), Gaussian naïve Bayes (GNB), k-nearest neighbors (k-NN), support vector machine (SVM), random forest (RF) and multilayer perceptron (MLP). Each model was subjected to hyperparameter optimization using a grid search approach combined with tenfold stratified cross-validation. This ensured that the class imbalance (<i>Cull</i> vs. <i>Stay</i>) was accounted during model evaluation. The best-performing model for each algorithm was selected on cross-validated accuracy. To evaluate the prediction performance of the ML algorithms on both labels from learned data, the metrics accuracy, precision, recall, F<sub>1</sub>-score and the Matthews correlation coefficient (MCC) were employed. Accuracy among all classifiers was >0.90. The poorest prediction performance was observed in GNB (MCC = 0.50) and LR (MCC = 0.72). Conversely, the rest of the classifiers achieved superior prediction performance in learning the specific culling criteria, reaching an MCC score >0.91. Overall, culling criteria can be learned and predicted by ML algorithms and their performance varies among classifiers. This study identified RF as the best performing algorithm, but k-NN, SVM and MLP are possible candidates to be used in on-farm conditions. To increase their reliability, these approaches need to be tested in several farms, under different scenarios and varieties of features.</p>","PeriodicalId":15615,"journal":{"name":"Journal of Dairy Research","volume":" ","pages":"1-10"},"PeriodicalIF":1.2000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A supervised machine learning approach for the decision-making process on data-based culling in dairy farms.\",\"authors\":\"Oscar R Espinoza Sandoval, Juan C Angeles-Hernandez, Agustín Corral-Luna, Felipe A Rodríguez-Almeida, Pablo Pinedo, Albert De Vries, Santiago A Utsumi, Einar Vargas-Bello-Pérez\",\"doi\":\"10.1017/S0022029925101416\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This research paper aimed to develop a supervised machine learning (ML) approach that learns and predicts data-based culling from farm information that reflects the criteria of the decisions taken to cull a cow by a farm manager. Data containing the features of milk yield, days in milk, lactation number, pregnancy status, days open and days pregnant were obtained from January to December 2020 from dairy cows on a large dairy farm in northern Mexico. The cows were labelled as those that were data-based culled (<i>Cull</i>) and those that were not culled (<i>Stay</i>). Six supervised ML algorithms were evaluated in a binary classification including logistic regression (LR), Gaussian naïve Bayes (GNB), k-nearest neighbors (k-NN), support vector machine (SVM), random forest (RF) and multilayer perceptron (MLP). Each model was subjected to hyperparameter optimization using a grid search approach combined with tenfold stratified cross-validation. This ensured that the class imbalance (<i>Cull</i> vs. <i>Stay</i>) was accounted during model evaluation. The best-performing model for each algorithm was selected on cross-validated accuracy. To evaluate the prediction performance of the ML algorithms on both labels from learned data, the metrics accuracy, precision, recall, F<sub>1</sub>-score and the Matthews correlation coefficient (MCC) were employed. Accuracy among all classifiers was >0.90. The poorest prediction performance was observed in GNB (MCC = 0.50) and LR (MCC = 0.72). Conversely, the rest of the classifiers achieved superior prediction performance in learning the specific culling criteria, reaching an MCC score >0.91. Overall, culling criteria can be learned and predicted by ML algorithms and their performance varies among classifiers. This study identified RF as the best performing algorithm, but k-NN, SVM and MLP are possible candidates to be used in on-farm conditions. To increase their reliability, these approaches need to be tested in several farms, under different scenarios and varieties of features.</p>\",\"PeriodicalId\":15615,\"journal\":{\"name\":\"Journal of Dairy Research\",\"volume\":\" \",\"pages\":\"1-10\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Dairy Research\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.1017/S0022029925101416\",\"RegionNum\":3,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dairy Research","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1017/S0022029925101416","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
A supervised machine learning approach for the decision-making process on data-based culling in dairy farms.
This research paper aimed to develop a supervised machine learning (ML) approach that learns and predicts data-based culling from farm information that reflects the criteria of the decisions taken to cull a cow by a farm manager. Data containing the features of milk yield, days in milk, lactation number, pregnancy status, days open and days pregnant were obtained from January to December 2020 from dairy cows on a large dairy farm in northern Mexico. The cows were labelled as those that were data-based culled (Cull) and those that were not culled (Stay). Six supervised ML algorithms were evaluated in a binary classification including logistic regression (LR), Gaussian naïve Bayes (GNB), k-nearest neighbors (k-NN), support vector machine (SVM), random forest (RF) and multilayer perceptron (MLP). Each model was subjected to hyperparameter optimization using a grid search approach combined with tenfold stratified cross-validation. This ensured that the class imbalance (Cull vs. Stay) was accounted during model evaluation. The best-performing model for each algorithm was selected on cross-validated accuracy. To evaluate the prediction performance of the ML algorithms on both labels from learned data, the metrics accuracy, precision, recall, F1-score and the Matthews correlation coefficient (MCC) were employed. Accuracy among all classifiers was >0.90. The poorest prediction performance was observed in GNB (MCC = 0.50) and LR (MCC = 0.72). Conversely, the rest of the classifiers achieved superior prediction performance in learning the specific culling criteria, reaching an MCC score >0.91. Overall, culling criteria can be learned and predicted by ML algorithms and their performance varies among classifiers. This study identified RF as the best performing algorithm, but k-NN, SVM and MLP are possible candidates to be used in on-farm conditions. To increase their reliability, these approaches need to be tested in several farms, under different scenarios and varieties of features.
期刊介绍:
The Journal of Dairy Research is an international Journal of high-standing that publishes original scientific research on all aspects of the biology, wellbeing and technology of lactating animals and the foods they produce. The Journal’s ability to cover the entire dairy foods chain is a major strength. Cross-disciplinary research is particularly welcomed, as is comparative lactation research in different dairy and non-dairy species and research dealing with consumer health aspects of dairy products. Journal of Dairy Research: an international Journal of the lactation sciences.