一种基于数据的奶牛场剔除决策过程的监督机器学习方法。

IF 1.2 3区农林科学 Q2 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Journal of Dairy Research Pub Date : 2025-09-18 DOI:10.1017/S0022029925101416

Oscar R Espinoza Sandoval, Juan C Angeles-Hernandez, Agustín Corral-Luna, Felipe A Rodríguez-Almeida, Pablo Pinedo, Albert De Vries, Santiago A Utsumi, Einar Vargas-Bello-Pérez

{"title":"一种基于数据的奶牛场剔除决策过程的监督机器学习方法。","authors":"Oscar R Espinoza Sandoval, Juan C Angeles-Hernandez, Agustín Corral-Luna, Felipe A Rodríguez-Almeida, Pablo Pinedo, Albert De Vries, Santiago A Utsumi, Einar Vargas-Bello-Pérez","doi":"10.1017/S0022029925101416","DOIUrl":null,"url":null,"abstract":"This research paper aimed to develop a supervised machine learning (ML) approach that learns and predicts data-based culling from farm information that reflects the criteria of the decisions taken to cull a cow by a farm manager. Data containing the features of milk yield, days in milk, lactation number, pregnancy status, days open and days pregnant were obtained from January to December 2020 from dairy cows on a large dairy farm in northern Mexico. The cows were labelled as those that were data-based culled (Cull) and those that were not culled (Stay). Six supervised ML algorithms were evaluated in a binary classification including logistic regression (LR), Gaussian naïve Bayes (GNB), k-nearest neighbors (k-NN), support vector machine (SVM), random forest (RF) and multilayer perceptron (MLP). Each model was subjected to hyperparameter optimization using a grid search approach combined with tenfold stratified cross-validation. This ensured that the class imbalance (Cull vs. Stay) was accounted during model evaluation. The best-performing model for each algorithm was selected on cross-validated accuracy. To evaluate the prediction performance of the ML algorithms on both labels from learned data, the metrics accuracy, precision, recall, F1-score and the Matthews correlation coefficient (MCC) were employed. Accuracy among all classifiers was >0.90. The poorest prediction performance was observed in GNB (MCC = 0.50) and LR (MCC = 0.72). Conversely, the rest of the classifiers achieved superior prediction performance in learning the specific culling criteria, reaching an MCC score >0.91. Overall, culling criteria can be learned and predicted by ML algorithms and their performance varies among classifiers. This study identified RF as the best performing algorithm, but k-NN, SVM and MLP are possible candidates to be used in on-farm conditions. To increase their reliability, these approaches need to be tested in several farms, under different scenarios and varieties of features.","PeriodicalId":15615,"journal":{"name":"Journal of Dairy Research","volume":" ","pages":"1-10"},"PeriodicalIF":1.2000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A supervised machine learning approach for the decision-making process on data-based culling in dairy farms.\",\"authors\":\"Oscar R Espinoza Sandoval, Juan C Angeles-Hernandez, Agustín Corral-Luna, Felipe A Rodríguez-Almeida, Pablo Pinedo, Albert De Vries, Santiago A Utsumi, Einar Vargas-Bello-Pérez\",\"doi\":\"10.1017/S0022029925101416\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research paper aimed to develop a supervised machine learning (ML) approach that learns and predicts data-based culling from farm information that reflects the criteria of the decisions taken to cull a cow by a farm manager. Data containing the features of milk yield, days in milk, lactation number, pregnancy status, days open and days pregnant were obtained from January to December 2020 from dairy cows on a large dairy farm in northern Mexico. The cows were labelled as those that were data-based culled (Cull) and those that were not culled (Stay). Six supervised ML algorithms were evaluated in a binary classification including logistic regression (LR), Gaussian naïve Bayes (GNB), k-nearest neighbors (k-NN), support vector machine (SVM), random forest (RF) and multilayer perceptron (MLP). Each model was subjected to hyperparameter optimization using a grid search approach combined with tenfold stratified cross-validation. This ensured that the class imbalance (Cull vs. Stay) was accounted during model evaluation. The best-performing model for each algorithm was selected on cross-validated accuracy. To evaluate the prediction performance of the ML algorithms on both labels from learned data, the metrics accuracy, precision, recall, F1-score and the Matthews correlation coefficient (MCC) were employed. Accuracy among all classifiers was >0.90. The poorest prediction performance was observed in GNB (MCC = 0.50) and LR (MCC = 0.72). Conversely, the rest of the classifiers achieved superior prediction performance in learning the specific culling criteria, reaching an MCC score >0.91. Overall, culling criteria can be learned and predicted by ML algorithms and their performance varies among classifiers. This study identified RF as the best performing algorithm, but k-NN, SVM and MLP are possible candidates to be used in on-farm conditions. To increase their reliability, these approaches need to be tested in several farms, under different scenarios and varieties of features.\",\"PeriodicalId\":15615,\"journal\":{\"name\":\"Journal of Dairy Research\",\"volume\":\" \",\"pages\":\"1-10\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Dairy Research\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.1017/S0022029925101416\",\"RegionNum\":3,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dairy Research","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1017/S0022029925101416","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}

引用次数: 0

摘要

本研究论文旨在开发一种有监督的机器学习（ML）方法，该方法可以从农场信息中学习和预测基于数据的扑杀，这些信息反映了农场经理扑杀奶牛的决策标准。从墨西哥北部一个大型奶牛场的奶牛中获得了2020年1月至12月的产奶量、泌乳天数、泌乳次数、妊娠状态、开胎天数和妊娠天数等特征数据。这些奶牛被标记为那些基于数据的剔除（Cull）和那些没有被剔除（Stay）。在二分类中评估了六种监督机器学习算法，包括逻辑回归（LR）、高斯naïve贝叶斯（GNB）、k-近邻（k-NN）、支持向量机（SVM）、随机森林（RF）和多层感知器（MLP）。每个模型都使用网格搜索方法结合十倍分层交叉验证进行超参数优化。这确保了在模型评估期间考虑到类别不平衡（淘汰vs.保留）。根据交叉验证的精度选择各算法的最佳模型。为了评估机器学习算法对学习数据的两种标签的预测性能，我们采用了准确性、精密度、召回率、F1-score和马修斯相关系数（MCC）。所有分类器的准确率为0.90。GNB （MCC = 0.50）和LR （MCC = 0.72）的预测效果最差。相反，其余分类器在学习特定的剔除标准方面取得了更好的预测性能，达到了MCC得分>0.91。总的来说，ML算法可以学习和预测筛选标准，它们的性能因分类器而异。本研究确定RF是性能最好的算法，但k-NN、SVM和MLP是可能用于农场条件的候选算法。为了提高可靠性，这些方法需要在几个农场进行测试，在不同的场景和各种特征下进行测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A supervised machine learning approach for the decision-making process on data-based culling in dairy farms.

This research paper aimed to develop a supervised machine learning (ML) approach that learns and predicts data-based culling from farm information that reflects the criteria of the decisions taken to cull a cow by a farm manager. Data containing the features of milk yield, days in milk, lactation number, pregnancy status, days open and days pregnant were obtained from January to December 2020 from dairy cows on a large dairy farm in northern Mexico. The cows were labelled as those that were data-based culled (Cull) and those that were not culled (Stay). Six supervised ML algorithms were evaluated in a binary classification including logistic regression (LR), Gaussian naïve Bayes (GNB), k-nearest neighbors (k-NN), support vector machine (SVM), random forest (RF) and multilayer perceptron (MLP). Each model was subjected to hyperparameter optimization using a grid search approach combined with tenfold stratified cross-validation. This ensured that the class imbalance (Cull vs. Stay) was accounted during model evaluation. The best-performing model for each algorithm was selected on cross-validated accuracy. To evaluate the prediction performance of the ML algorithms on both labels from learned data, the metrics accuracy, precision, recall, F₁-score and the Matthews correlation coefficient (MCC) were employed. Accuracy among all classifiers was >0.90. The poorest prediction performance was observed in GNB (MCC = 0.50) and LR (MCC = 0.72). Conversely, the rest of the classifiers achieved superior prediction performance in learning the specific culling criteria, reaching an MCC score >0.91. Overall, culling criteria can be learned and predicted by ML algorithms and their performance varies among classifiers. This study identified RF as the best performing algorithm, but k-NN, SVM and MLP are possible candidates to be used in on-farm conditions. To increase their reliability, these approaches need to be tested in several farms, under different scenarios and varieties of features.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Dairy Research 农林科学-奶制品与动物科学

CiteScore

3.80

自引率

4.80%

发文量

117

审稿时长

12-24 weeks

期刊介绍： The Journal of Dairy Research is an international Journal of high-standing that publishes original scientific research on all aspects of the biology, wellbeing and technology of lactating animals and the foods they produce. The Journal’s ability to cover the entire dairy foods chain is a major strength. Cross-disciplinary research is particularly welcomed, as is comparative lactation research in different dairy and non-dairy species and research dealing with consumer health aspects of dairy products. Journal of Dairy Research: an international Journal of the lactation sciences.