Xin Yang , Yan Wang , Debao Nie , Qinggang Zhang , Wei Zheng , Baisheng Dai , Weizheng Shen
{"title":"An integrated grade classification model to evaluate raw milk quality","authors":"Xin Yang , Yan Wang , Debao Nie , Qinggang Zhang , Wei Zheng , Baisheng Dai , Weizheng Shen","doi":"10.1016/j.compag.2025.110565","DOIUrl":null,"url":null,"abstract":"<div><div>The quality of raw milk is crucial for both dairy farming and the dairy industry. This study presents an integrated grade classification model to evaluate raw milk quality based on fat content, protein content, and somatic cell count. Near-infrared (NIR) technology was employed to develop a rapid classification model. To address the challenge of modeling the complex nonlinear relationship between raw milk quality grades and spectral variables, a novel hybrid variable selection method based on combining Extreme Gradient Boosting (XGBoost) was proposed in this paper. A total of 617 raw milk samples were collected and divided three quality grades. Firstly, various preprocessing methods were applied to raw milk spectral data including Savitzky-Golay smoothing, standard normal variate (SNV), multiplicative scatter correction, and first derivative. SNV was chosen for noise removal according its performance. Then, XGBoost-based forward feature selection (XGBFFS) and further optimized by genetic algorithm (GA) was used to selection variables. For XGBFFS, variable importance values were computed by XGBoost method and variables were selected by forward feature selection. And then GA was employed to further optimize and reduce variable space. The XGBFFS-GA method was applied to quality evaluation of raw milk and compared to traditional variable selections, including ReliefF, uninformative variable elimination, and competitive adaptive reweighted sampling. Integrated models were built by support Vector Machine (SVM) and XGBoost for different variable selection methods. The results indicated that variable selection methods based on XGBoost effectively reduce variable space and the XGBFFS-GA demonstrated the best performance for quality evaluation of raw milk. Finally, the XGBFFS-GA-SVM model achieved the best results, with prediction set accuracy of 94.84% and F1 score of 94.21%. This study introduces a new idea for variable selection in NIR spectroscopy analysis and a rapid integrated grade classification model for raw milk quality evaluation.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"237 ","pages":"Article 110565"},"PeriodicalIF":7.7000,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925006714","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The quality of raw milk is crucial for both dairy farming and the dairy industry. This study presents an integrated grade classification model to evaluate raw milk quality based on fat content, protein content, and somatic cell count. Near-infrared (NIR) technology was employed to develop a rapid classification model. To address the challenge of modeling the complex nonlinear relationship between raw milk quality grades and spectral variables, a novel hybrid variable selection method based on combining Extreme Gradient Boosting (XGBoost) was proposed in this paper. A total of 617 raw milk samples were collected and divided three quality grades. Firstly, various preprocessing methods were applied to raw milk spectral data including Savitzky-Golay smoothing, standard normal variate (SNV), multiplicative scatter correction, and first derivative. SNV was chosen for noise removal according its performance. Then, XGBoost-based forward feature selection (XGBFFS) and further optimized by genetic algorithm (GA) was used to selection variables. For XGBFFS, variable importance values were computed by XGBoost method and variables were selected by forward feature selection. And then GA was employed to further optimize and reduce variable space. The XGBFFS-GA method was applied to quality evaluation of raw milk and compared to traditional variable selections, including ReliefF, uninformative variable elimination, and competitive adaptive reweighted sampling. Integrated models were built by support Vector Machine (SVM) and XGBoost for different variable selection methods. The results indicated that variable selection methods based on XGBoost effectively reduce variable space and the XGBFFS-GA demonstrated the best performance for quality evaluation of raw milk. Finally, the XGBFFS-GA-SVM model achieved the best results, with prediction set accuracy of 94.84% and F1 score of 94.21%. This study introduces a new idea for variable selection in NIR spectroscopy analysis and a rapid integrated grade classification model for raw milk quality evaluation.
期刊介绍:
Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.