筛选和选择机器学习算法，用于开发模型，使用自动健康监测技术和其他奶牛健康预测数据选择奶牛进行临床检查。

IF 4.4 1区农林科学 Q1 AGRICULTURE, DAIRY & ANIMAL SCIENCE

Journal of Dairy Science Pub Date : 2025-10-01 DOI:10.3168/jds.2025-26511

M M Perez, E Cabrera, C Rial, Y You, Y Wang, K Weinberger, D V Nydam, J O Giordano

{"title":"筛选和选择机器学习算法，用于开发模型，使用自动健康监测技术和其他奶牛健康预测数据选择奶牛进行临床检查。","authors":"M M Perez, E Cabrera, C Rial, Y You, Y Wang, K Weinberger, D V Nydam, J O Giordano","doi":"10.3168/jds.2025-26511","DOIUrl":null,"url":null,"abstract":"The objective of this study was to create a framework for training and selecting machine learning algorithms (MLA) to classify cow health status daily using data from multiple automated health monitoring systems (AHMS), including wearable and nonwearable sensors, combined with nonsensor data of potential value for predicting cow health. The work presented in this manuscript is part of a series of studies aimed at identifying a single candidate algorithm that, upon extensive refinement and further development, could be deployed in a commercial dairy operation to identify cows potentially affected by health disorders for clinical examination. Data from AHMS and other cow features and performance data, including the clinical health status of cows, were collected in a prospective cohort study including Holstein cows (n = 1,252). Data from AHMS used for MLA training included rumination, eating, and physical activity measured in the neck (neck sensor), temperature and physical activity measured in the reticulorumen (bolus sensor), physical activity and resting measured in the leg (leg sensor), and milk yield, milk electrical conductivity, and milk components (parlor sensors). Other non-AHMS data used were temperature and humidity index, cow and calving event features, and current and previous lactation performance and management indicators. The dataset included 22,415 cow-day records with 49 features. The dataset was split into training and testing sets in an 80:20 ratio, resulting in 17,887 and 4,528 cow-day records, respectively. Data imputation and standardization were applied automatically or manually. A diverse set of nondeep learning (n = 26) MLA were trained and compared using the open-source automated ML (AutoML) tool Lazy Predict Classifier (LZP). Upon selection of the best-performing nondeep learning algorithms (i.e., XGBoost, AdaBoost, Nearest Centroid, and Bernoulli Naive Bayes) from the pool tested with LZP, classifier algorithms were compared with more complex deep learning algorithms (multilayer perceptron, recurrent neural networks, long short-term memory networks, and gated recurrent unit models) not included in LZP. All algorithms underwent training and evaluation before selection of a single best-performing algorithm, using several metrics of performance. Ensemble learning models, particularly XGBoost, achieved the best performance and balanced results with a sensitivity of 82.4% and a precision of 42.6% combined with a specificity of 86.4% and a negative predictive value of 97.6%. This model also had the highest F1-score (0.56) and area under the curve (84.4%). The XGBoost algorithm also demonstrated robustness in handling missing data. Our comprehensive approach to MLA screening and selection enabled informed decisions in selecting a suitable algorithm for identifying cows for clinical examination based on the daily prediction of health disorder occurrence. The combination of the AutoML tool LZP and manual refinement and testing of multiple MLA provided a robust framework for comparing multiple ML models. Ensemble classification learner algorithms such as XGBoost and Adaboost might outperform other deep learning and nondeep learning algorithms for classifying cow health daily using AHMS and other cow management and performance indicators.","PeriodicalId":354,"journal":{"name":"Journal of Dairy Science","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Screening and selection of a machine learning algorithm for development of a model to select cows for clinical examination using data from automated health monitoring technologies and other predictors of cow health.\",\"authors\":\"M M Perez, E Cabrera, C Rial, Y You, Y Wang, K Weinberger, D V Nydam, J O Giordano\",\"doi\":\"10.3168/jds.2025-26511\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of this study was to create a framework for training and selecting machine learning algorithms (MLA) to classify cow health status daily using data from multiple automated health monitoring systems (AHMS), including wearable and nonwearable sensors, combined with nonsensor data of potential value for predicting cow health. The work presented in this manuscript is part of a series of studies aimed at identifying a single candidate algorithm that, upon extensive refinement and further development, could be deployed in a commercial dairy operation to identify cows potentially affected by health disorders for clinical examination. Data from AHMS and other cow features and performance data, including the clinical health status of cows, were collected in a prospective cohort study including Holstein cows (n = 1,252). Data from AHMS used for MLA training included rumination, eating, and physical activity measured in the neck (neck sensor), temperature and physical activity measured in the reticulorumen (bolus sensor), physical activity and resting measured in the leg (leg sensor), and milk yield, milk electrical conductivity, and milk components (parlor sensors). Other non-AHMS data used were temperature and humidity index, cow and calving event features, and current and previous lactation performance and management indicators. The dataset included 22,415 cow-day records with 49 features. The dataset was split into training and testing sets in an 80:20 ratio, resulting in 17,887 and 4,528 cow-day records, respectively. Data imputation and standardization were applied automatically or manually. A diverse set of nondeep learning (n = 26) MLA were trained and compared using the open-source automated ML (AutoML) tool Lazy Predict Classifier (LZP). Upon selection of the best-performing nondeep learning algorithms (i.e., XGBoost, AdaBoost, Nearest Centroid, and Bernoulli Naive Bayes) from the pool tested with LZP, classifier algorithms were compared with more complex deep learning algorithms (multilayer perceptron, recurrent neural networks, long short-term memory networks, and gated recurrent unit models) not included in LZP. All algorithms underwent training and evaluation before selection of a single best-performing algorithm, using several metrics of performance. Ensemble learning models, particularly XGBoost, achieved the best performance and balanced results with a sensitivity of 82.4% and a precision of 42.6% combined with a specificity of 86.4% and a negative predictive value of 97.6%. This model also had the highest F1-score (0.56) and area under the curve (84.4%). The XGBoost algorithm also demonstrated robustness in handling missing data. Our comprehensive approach to MLA screening and selection enabled informed decisions in selecting a suitable algorithm for identifying cows for clinical examination based on the daily prediction of health disorder occurrence. The combination of the AutoML tool LZP and manual refinement and testing of multiple MLA provided a robust framework for comparing multiple ML models. Ensemble classification learner algorithms such as XGBoost and Adaboost might outperform other deep learning and nondeep learning algorithms for classifying cow health daily using AHMS and other cow management and performance indicators.\",\"PeriodicalId\":354,\"journal\":{\"name\":\"Journal of Dairy Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Dairy Science\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.3168/jds.2025-26511\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, DAIRY & ANIMAL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Dairy Science","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.3168/jds.2025-26511","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, DAIRY & ANIMAL SCIENCE","Score":null,"Total":0}

引用次数: 0

摘要

本研究的目的是创建一个框架，用于训练和选择机器学习算法（MLA），使用来自多个自动健康监测系统（AHMS）的数据，包括可穿戴和非可穿戴传感器，结合预测奶牛健康的潜在价值的非传感器数据，对奶牛的健康状况进行日常分类。本手稿中介绍的工作是一系列研究的一部分，旨在确定一个单一的候选算法，经过广泛的改进和进一步的发展，可以在商业乳制品操作中部署，以识别可能受到健康疾病影响的奶牛进行临床检查。在一项包括荷斯坦奶牛（n = 1,252）的前瞻性队列研究中，收集了AHMS数据以及奶牛的其他特征和性能数据，包括奶牛的临床健康状况。用于MLA训练的AHMS数据包括反刍、进食和颈部测量的身体活动（颈部传感器），网状膜测量的温度和身体活动（bolus传感器），腿部测量的身体活动和休息（腿部传感器），以及产奶量、牛奶电导率和牛奶成分（客厅传感器）。其他使用的非ahms数据包括温度和湿度指数、奶牛和产犊事件特征、当前和以前的泌乳性能和管理指标。该数据集包括22415条奶牛日记录，包含49个特征。数据集以80:20的比例分成训练集和测试集，分别产生17,887和4,528个牛日记录。数据输入和标准化可自动或手动进行。使用开源自动化机器学习（AutoML）工具Lazy Predict Classifier （LZP）训练和比较了一组不同的非深度学习（n = 26） MLA。从LZP测试的池中选择性能最好的非深度学习算法（即XGBoost、AdaBoost、Nearest Centroid和Bernoulli Naive Bayes）后，将分类器算法与LZP中未包含的更复杂的深度学习算法（多层感知器、递归神经网络、长短期记忆网络和门控递归单元模型）进行比较。所有算法都经过训练和评估，然后使用几个性能指标选择一个最佳算法。集成学习模型，特别是XGBoost，获得了最佳的性能和平衡的结果，灵敏度为82.4%，精度为42.6%，特异性为86.4%，负预测值为97.6%。该模型具有最高的f1评分（0.56）和曲线下面积（84.4%）。XGBoost算法在处理缺失数据方面也表现出鲁棒性。我们对MLA筛查和选择的综合方法能够根据日常健康障碍发生的预测，选择合适的算法来识别奶牛进行临床检查，从而做出明智的决定。AutoML工具LZP与多个MLA的手工细化和测试相结合，为比较多个ML模型提供了一个健壮的框架。集成分类学习算法，如XGBoost和Adaboost，可能优于其他深度学习和非深度学习算法，使用AHMS和其他奶牛管理和绩效指标对奶牛进行日常健康分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Screening and selection of a machine learning algorithm for development of a model to select cows for clinical examination using data from automated health monitoring technologies and other predictors of cow health.

The objective of this study was to create a framework for training and selecting machine learning algorithms (MLA) to classify cow health status daily using data from multiple automated health monitoring systems (AHMS), including wearable and nonwearable sensors, combined with nonsensor data of potential value for predicting cow health. The work presented in this manuscript is part of a series of studies aimed at identifying a single candidate algorithm that, upon extensive refinement and further development, could be deployed in a commercial dairy operation to identify cows potentially affected by health disorders for clinical examination. Data from AHMS and other cow features and performance data, including the clinical health status of cows, were collected in a prospective cohort study including Holstein cows (n = 1,252). Data from AHMS used for MLA training included rumination, eating, and physical activity measured in the neck (neck sensor), temperature and physical activity measured in the reticulorumen (bolus sensor), physical activity and resting measured in the leg (leg sensor), and milk yield, milk electrical conductivity, and milk components (parlor sensors). Other non-AHMS data used were temperature and humidity index, cow and calving event features, and current and previous lactation performance and management indicators. The dataset included 22,415 cow-day records with 49 features. The dataset was split into training and testing sets in an 80:20 ratio, resulting in 17,887 and 4,528 cow-day records, respectively. Data imputation and standardization were applied automatically or manually. A diverse set of nondeep learning (n = 26) MLA were trained and compared using the open-source automated ML (AutoML) tool Lazy Predict Classifier (LZP). Upon selection of the best-performing nondeep learning algorithms (i.e., XGBoost, AdaBoost, Nearest Centroid, and Bernoulli Naive Bayes) from the pool tested with LZP, classifier algorithms were compared with more complex deep learning algorithms (multilayer perceptron, recurrent neural networks, long short-term memory networks, and gated recurrent unit models) not included in LZP. All algorithms underwent training and evaluation before selection of a single best-performing algorithm, using several metrics of performance. Ensemble learning models, particularly XGBoost, achieved the best performance and balanced results with a sensitivity of 82.4% and a precision of 42.6% combined with a specificity of 86.4% and a negative predictive value of 97.6%. This model also had the highest F1-score (0.56) and area under the curve (84.4%). The XGBoost algorithm also demonstrated robustness in handling missing data. Our comprehensive approach to MLA screening and selection enabled informed decisions in selecting a suitable algorithm for identifying cows for clinical examination based on the daily prediction of health disorder occurrence. The combination of the AutoML tool LZP and manual refinement and testing of multiple MLA provided a robust framework for comparing multiple ML models. Ensemble classification learner algorithms such as XGBoost and Adaboost might outperform other deep learning and nondeep learning algorithms for classifying cow health daily using AHMS and other cow management and performance indicators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Dairy Science 农林科学-奶制品与动物科学

CiteScore

7.90

自引率

17.10%

发文量

784

审稿时长

4.2 months

期刊介绍： The official journal of the American Dairy Science Association®, Journal of Dairy Science® (JDS) is the leading peer-reviewed general dairy research journal in the world. JDS readers represent education, industry, and government agencies in more than 70 countries with interests in biochemistry, breeding, economics, engineering, environment, food science, genetics, microbiology, nutrition, pathology, physiology, processing, public health, quality assurance, and sanitation.