Evan Y. Liu , Shuiping Wang , Bihong Zhang , Nazir Ahmad Khan , Shaoxun Tang , Chuanshe Zhou , Zhixiong He , Zhiliang Tan , Yong Liu
{"title":"用于大型奶牛群泌乳性能精确预测的机器学习框架:整合饮食、环境和健康风险因素","authors":"Evan Y. Liu , Shuiping Wang , Bihong Zhang , Nazir Ahmad Khan , Shaoxun Tang , Chuanshe Zhou , Zhixiong He , Zhiliang Tan , Yong Liu","doi":"10.1016/j.compag.2025.110832","DOIUrl":null,"url":null,"abstract":"<div><div>Precision management of large dairy herds requires accurate prediction of daily milk yield (DMY) and early identification of health risks. This study presents a machine learning framework addressing two interlinked objectives: (1) high-accuracy DMY prediction, and (2) early disease detection, using a 5-year dataset from 19,798 Hostein cows. Our novel dual modeling pipeline integrates advanced ML models, including Random Forest (RF), Gradient Boosting Regression (GBR), and Extreme Gradient Boosting (XGB), to predict DMY using dynamic environmental, dietary, and physiological inputs. RF emerged as the top model for DMY prediction (R<sup>2</sup> = 0.77, RMSE = 5.72 kg/d), capturing nonlinear interactions among lactation stage, parity, and nutrient intake. Shapley Additive exPlanations (SHAP) analysis identified lactation days, parity, and dietary ether extract (EE) as key drivers; a 1 % EE increase boosted DMY by 2.1 kg/d in multiparous cows, and neutral detergent fiber (NDF) > 35 % reduced DMY by 4.2 kg/d due to rumen fill limitations. For disease detection, an RF classifier integrated SMOTE (synthetic minority over-sampling technique) achieved robust performance (AUC = 0.93, sensitivity = 0.80), enabling early identification of mastitis and ketosis <em>via</em> yield deviations. Real-time temperature-humidity index (THI) alerts (>72) reduced yield losses by 4.8 kg/d. Practical applications include dietary optimization (16.8 % crude protein, 5.8 % EE during peak lactation) and automated health alerts. This study advances precision dairy farming by providing a scalable, interpretable framework that bridges ML innovation with actionable herd management strategies, validated across diverse lactation stages and environmental conditions. Future work will integrate genomic and IoT-enabled sensor data to enhance predictive accuracy and adaptability.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"238 ","pages":"Article 110832"},"PeriodicalIF":8.9000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A machine learning framework for precision prediction of lactation performance in large dairy herds: Integrating dietary, environmental, and health risk factors\",\"authors\":\"Evan Y. Liu , Shuiping Wang , Bihong Zhang , Nazir Ahmad Khan , Shaoxun Tang , Chuanshe Zhou , Zhixiong He , Zhiliang Tan , Yong Liu\",\"doi\":\"10.1016/j.compag.2025.110832\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Precision management of large dairy herds requires accurate prediction of daily milk yield (DMY) and early identification of health risks. This study presents a machine learning framework addressing two interlinked objectives: (1) high-accuracy DMY prediction, and (2) early disease detection, using a 5-year dataset from 19,798 Hostein cows. Our novel dual modeling pipeline integrates advanced ML models, including Random Forest (RF), Gradient Boosting Regression (GBR), and Extreme Gradient Boosting (XGB), to predict DMY using dynamic environmental, dietary, and physiological inputs. RF emerged as the top model for DMY prediction (R<sup>2</sup> = 0.77, RMSE = 5.72 kg/d), capturing nonlinear interactions among lactation stage, parity, and nutrient intake. Shapley Additive exPlanations (SHAP) analysis identified lactation days, parity, and dietary ether extract (EE) as key drivers; a 1 % EE increase boosted DMY by 2.1 kg/d in multiparous cows, and neutral detergent fiber (NDF) > 35 % reduced DMY by 4.2 kg/d due to rumen fill limitations. For disease detection, an RF classifier integrated SMOTE (synthetic minority over-sampling technique) achieved robust performance (AUC = 0.93, sensitivity = 0.80), enabling early identification of mastitis and ketosis <em>via</em> yield deviations. Real-time temperature-humidity index (THI) alerts (>72) reduced yield losses by 4.8 kg/d. Practical applications include dietary optimization (16.8 % crude protein, 5.8 % EE during peak lactation) and automated health alerts. This study advances precision dairy farming by providing a scalable, interpretable framework that bridges ML innovation with actionable herd management strategies, validated across diverse lactation stages and environmental conditions. Future work will integrate genomic and IoT-enabled sensor data to enhance predictive accuracy and adaptability.</div></div>\",\"PeriodicalId\":50627,\"journal\":{\"name\":\"Computers and Electronics in Agriculture\",\"volume\":\"238 \",\"pages\":\"Article 110832\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers and Electronics in Agriculture\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S016816992500938X\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURE, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S016816992500938X","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
A machine learning framework for precision prediction of lactation performance in large dairy herds: Integrating dietary, environmental, and health risk factors
Precision management of large dairy herds requires accurate prediction of daily milk yield (DMY) and early identification of health risks. This study presents a machine learning framework addressing two interlinked objectives: (1) high-accuracy DMY prediction, and (2) early disease detection, using a 5-year dataset from 19,798 Hostein cows. Our novel dual modeling pipeline integrates advanced ML models, including Random Forest (RF), Gradient Boosting Regression (GBR), and Extreme Gradient Boosting (XGB), to predict DMY using dynamic environmental, dietary, and physiological inputs. RF emerged as the top model for DMY prediction (R2 = 0.77, RMSE = 5.72 kg/d), capturing nonlinear interactions among lactation stage, parity, and nutrient intake. Shapley Additive exPlanations (SHAP) analysis identified lactation days, parity, and dietary ether extract (EE) as key drivers; a 1 % EE increase boosted DMY by 2.1 kg/d in multiparous cows, and neutral detergent fiber (NDF) > 35 % reduced DMY by 4.2 kg/d due to rumen fill limitations. For disease detection, an RF classifier integrated SMOTE (synthetic minority over-sampling technique) achieved robust performance (AUC = 0.93, sensitivity = 0.80), enabling early identification of mastitis and ketosis via yield deviations. Real-time temperature-humidity index (THI) alerts (>72) reduced yield losses by 4.8 kg/d. Practical applications include dietary optimization (16.8 % crude protein, 5.8 % EE during peak lactation) and automated health alerts. This study advances precision dairy farming by providing a scalable, interpretable framework that bridges ML innovation with actionable herd management strategies, validated across diverse lactation stages and environmental conditions. Future work will integrate genomic and IoT-enabled sensor data to enhance predictive accuracy and adaptability.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.