M M Perez, E Cabrera, C Rial, Y You, Y Wang, K Weinberger, D V Nydam, J O Giordano
{"title":"筛选和选择机器学习算法,用于开发模型,使用自动健康监测技术和其他奶牛健康预测数据选择奶牛进行临床检查。","authors":"M M Perez, E Cabrera, C Rial, Y You, Y Wang, K Weinberger, D V Nydam, J O Giordano","doi":"10.3168/jds.2025-26511","DOIUrl":null,"url":null,"abstract":"<p><p>The objective of this study was to create a framework for training and selecting machine learning algorithms (MLA) to classify cow health status daily using data from multiple automated health monitoring systems (AHMS), including wearable and nonwearable sensors, combined with nonsensor data of potential value for predicting cow health. The work presented in this manuscript is part of a series of studies aimed at identifying a single candidate algorithm that, upon extensive refinement and further development, could be deployed in a commercial dairy operation to identify cows potentially affected by health disorders for clinical examination. Data from AHMS and other cow features and performance data, including the clinical health status of cows, were collected in a prospective cohort study including Holstein cows (n = 1,252). Data from AHMS used for MLA training included rumination, eating, and physical activity measured in the neck (neck sensor), temperature and physical activity measured in the reticulorumen (bolus sensor), physical activity and resting measured in the leg (leg sensor), and milk yield, milk electrical conductivity, and milk components (parlor sensors). Other non-AHMS data used were temperature and humidity index, cow and calving event features, and current and previous lactation performance and management indicators. The dataset included 22,415 cow-day records with 49 features. The dataset was split into training and testing sets in an 80:20 ratio, resulting in 17,887 and 4,528 cow-day records, respectively. Data imputation and standardization were applied automatically or manually. A diverse set of nondeep learning (n = 26) MLA were trained and compared using the open-source automated ML (AutoML) tool Lazy Predict Classifier (LZP). Upon selection of the best-performing nondeep learning algorithms (i.e., XGBoost, AdaBoost, Nearest Centroid, and Bernoulli Naive Bayes) from the pool tested with LZP, classifier algorithms were compared with more complex deep learning algorithms (multilayer perceptron, recurrent neural networks, long short-term memory networks, and gated recurrent unit models) not included in LZP. All algorithms underwent training and evaluation before selection of a single best-performing algorithm, using several metrics of performance. Ensemble learning models, particularly XGBoost, achieved the best performance and balanced results with a sensitivity of 82.4% and a precision of 42.6% combined with a specificity of 86.4% and a negative predictive value of 97.6%. This model also had the highest F1-score (0.56) and area under the curve (84.4%). The XGBoost algorithm also demonstrated robustness in handling missing data. Our comprehensive approach to MLA screening and selection enabled informed decisions in selecting a suitable algorithm for identifying cows for clinical examination based on the daily prediction of health disorder occurrence. The combination of the AutoML tool LZP and manual refinement and testing of multiple MLA provided a robust framework for comparing multiple ML models. Ensemble classification learner algorithms such as XGBoost and Adaboost might outperform other deep learning and nondeep learning algorithms for classifying cow health daily using AHMS and other cow management and performance indicators.</p>","PeriodicalId":354,"journal":{"name":"Journal of Dairy Science","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Screening and selection of a machine learning algorithm for development of a model to select cows for clinical examination using data from automated health monitoring technologies and other predictors of cow health.\",\"authors\":\"M M Perez, E Cabrera, C Rial, Y You, Y Wang, K Weinberger, D V Nydam, J O Giordano\",\"doi\":\"10.3168/jds.2025-26511\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The objective of this study was to create a framework for training and selecting machine learning algorithms (MLA) to classify cow health status daily using data from multiple automated health monitoring systems (AHMS), including wearable and nonwearable sensors, combined with nonsensor data of potential value for predicting cow health. The work presented in this manuscript is part of a series of studies aimed at identifying a single candidate algorithm that, upon extensive refinement and further development, could be deployed in a commercial dairy operation to identify cows potentially affected by health disorders for clinical examination. Data from AHMS and other cow features and performance data, including the clinical health status of cows, were collected in a prospective cohort study including Holstein cows (n = 1,252). Data from AHMS used for MLA training included rumination, eating, and physical activity measured in the neck (neck sensor), temperature and physical activity measured in the reticulorumen (bolus sensor), physical activity and resting measured in the leg (leg sensor), and milk yield, milk electrical conductivity, and milk components (parlor sensors). Other non-AHMS data used were temperature and humidity index, cow and calving event features, and current and previous lactation performance and management indicators. The dataset included 22,415 cow-day records with 49 features. The dataset was split into training and testing sets in an 80:20 ratio, resulting in 17,887 and 4,528 cow-day records, respectively. Data imputation and standardization were applied automatically or manually. A diverse set of nondeep learning (n = 26) MLA were trained and compared using the open-source automated ML (AutoML) tool Lazy Predict Classifier (LZP). Upon selection of the best-performing nondeep learning algorithms (i.e., XGBoost, AdaBoost, Nearest Centroid, and Bernoulli Naive Bayes) from the pool tested with LZP, classifier algorithms were compared with more complex deep learning algorithms (multilayer perceptron, recurrent neural networks, long short-term memory networks, and gated recurrent unit models) not included in LZP. All algorithms underwent training and evaluation before selection of a single best-performing algorithm, using several metrics of performance. Ensemble learning models, particularly XGBoost, achieved the best performance and balanced results with a sensitivity of 82.4% and a precision of 42.6% combined with a specificity of 86.4% and a negative predictive value of 97.6%. This model also had the highest F1-score (0.56) and area under the curve (84.4%). The XGBoost algorithm also demonstrated robustness in handling missing data. Our comprehensive approach to MLA screening and selection enabled informed decisions in selecting a suitable algorithm for identifying cows for clinical examination based on the daily prediction of health disorder occurrence. The combination of the AutoML tool LZP and manual refinement and testing of multiple MLA provided a robust framework for comparing multiple ML models. Ensemble classification learner algorithms such as XGBoost and Adaboost might outperform other deep learning and nondeep learning algorithms for classifying cow health daily using AHMS and other cow management and performance indicators.</p>\",\"PeriodicalId\":354,\"journal\":{\"name\":\"Journal of Dairy Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Dairy Science\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.3168/jds.2025-26511\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dairy Science","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3168/jds.2025-26511","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}
Screening and selection of a machine learning algorithm for development of a model to select cows for clinical examination using data from automated health monitoring technologies and other predictors of cow health.
The objective of this study was to create a framework for training and selecting machine learning algorithms (MLA) to classify cow health status daily using data from multiple automated health monitoring systems (AHMS), including wearable and nonwearable sensors, combined with nonsensor data of potential value for predicting cow health. The work presented in this manuscript is part of a series of studies aimed at identifying a single candidate algorithm that, upon extensive refinement and further development, could be deployed in a commercial dairy operation to identify cows potentially affected by health disorders for clinical examination. Data from AHMS and other cow features and performance data, including the clinical health status of cows, were collected in a prospective cohort study including Holstein cows (n = 1,252). Data from AHMS used for MLA training included rumination, eating, and physical activity measured in the neck (neck sensor), temperature and physical activity measured in the reticulorumen (bolus sensor), physical activity and resting measured in the leg (leg sensor), and milk yield, milk electrical conductivity, and milk components (parlor sensors). Other non-AHMS data used were temperature and humidity index, cow and calving event features, and current and previous lactation performance and management indicators. The dataset included 22,415 cow-day records with 49 features. The dataset was split into training and testing sets in an 80:20 ratio, resulting in 17,887 and 4,528 cow-day records, respectively. Data imputation and standardization were applied automatically or manually. A diverse set of nondeep learning (n = 26) MLA were trained and compared using the open-source automated ML (AutoML) tool Lazy Predict Classifier (LZP). Upon selection of the best-performing nondeep learning algorithms (i.e., XGBoost, AdaBoost, Nearest Centroid, and Bernoulli Naive Bayes) from the pool tested with LZP, classifier algorithms were compared with more complex deep learning algorithms (multilayer perceptron, recurrent neural networks, long short-term memory networks, and gated recurrent unit models) not included in LZP. All algorithms underwent training and evaluation before selection of a single best-performing algorithm, using several metrics of performance. Ensemble learning models, particularly XGBoost, achieved the best performance and balanced results with a sensitivity of 82.4% and a precision of 42.6% combined with a specificity of 86.4% and a negative predictive value of 97.6%. This model also had the highest F1-score (0.56) and area under the curve (84.4%). The XGBoost algorithm also demonstrated robustness in handling missing data. Our comprehensive approach to MLA screening and selection enabled informed decisions in selecting a suitable algorithm for identifying cows for clinical examination based on the daily prediction of health disorder occurrence. The combination of the AutoML tool LZP and manual refinement and testing of multiple MLA provided a robust framework for comparing multiple ML models. Ensemble classification learner algorithms such as XGBoost and Adaboost might outperform other deep learning and nondeep learning algorithms for classifying cow health daily using AHMS and other cow management and performance indicators.
期刊介绍:
The official journal of the American Dairy Science Association®, Journal of Dairy Science® (JDS) is the leading peer-reviewed general dairy research journal in the world. JDS readers represent education, industry, and government agencies in more than 70 countries with interests in biochemistry, breeding, economics, engineering, environment, food science, genetics, microbiology, nutrition, pathology, physiology, processing, public health, quality assurance, and sanitation.