An integrated grade classification model to evaluate raw milk quality

IF 7.7 1区 农林科学 Q1 AGRICULTURE, MULTIDISCIPLINARY
Xin Yang , Yan Wang , Debao Nie , Qinggang Zhang , Wei Zheng , Baisheng Dai , Weizheng Shen
{"title":"An integrated grade classification model to evaluate raw milk quality","authors":"Xin Yang ,&nbsp;Yan Wang ,&nbsp;Debao Nie ,&nbsp;Qinggang Zhang ,&nbsp;Wei Zheng ,&nbsp;Baisheng Dai ,&nbsp;Weizheng Shen","doi":"10.1016/j.compag.2025.110565","DOIUrl":null,"url":null,"abstract":"<div><div>The quality of raw milk is crucial for both dairy farming and the dairy industry. This study presents an integrated grade classification model to evaluate raw milk quality based on fat content, protein content, and somatic cell count. Near-infrared (NIR) technology was employed to develop a rapid classification model. To address the challenge of modeling the complex nonlinear relationship between raw milk quality grades and spectral variables, a novel hybrid variable selection method based on combining Extreme Gradient Boosting (XGBoost) was proposed in this paper. A total of 617 raw milk samples were collected and divided three quality grades. Firstly, various preprocessing methods were applied to raw milk spectral data including Savitzky-Golay smoothing, standard normal variate (SNV), multiplicative scatter correction, and first derivative. SNV was chosen for noise removal according its performance. Then, XGBoost-based forward feature selection (XGBFFS) and further optimized by genetic algorithm (GA) was used to selection variables. For XGBFFS, variable importance values were computed by XGBoost method and variables were selected by forward feature selection. And then GA was employed to further optimize and reduce variable space. The XGBFFS-GA method was applied to quality evaluation of raw milk and compared to traditional variable selections, including ReliefF, uninformative variable elimination, and competitive adaptive reweighted sampling. Integrated models were built by support Vector Machine (SVM) and XGBoost for different variable selection methods. The results indicated that variable selection methods based on XGBoost effectively reduce variable space and the XGBFFS-GA demonstrated the best performance for quality evaluation of raw milk. Finally, the XGBFFS-GA-SVM model achieved the best results, with prediction set accuracy of 94.84% and F1 score of 94.21%. This study introduces a new idea for variable selection in NIR spectroscopy analysis and a rapid integrated grade classification model for raw milk quality evaluation.</div></div>","PeriodicalId":50627,"journal":{"name":"Computers and Electronics in Agriculture","volume":"237 ","pages":"Article 110565"},"PeriodicalIF":7.7000,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers and Electronics in Agriculture","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0168169925006714","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

The quality of raw milk is crucial for both dairy farming and the dairy industry. This study presents an integrated grade classification model to evaluate raw milk quality based on fat content, protein content, and somatic cell count. Near-infrared (NIR) technology was employed to develop a rapid classification model. To address the challenge of modeling the complex nonlinear relationship between raw milk quality grades and spectral variables, a novel hybrid variable selection method based on combining Extreme Gradient Boosting (XGBoost) was proposed in this paper. A total of 617 raw milk samples were collected and divided three quality grades. Firstly, various preprocessing methods were applied to raw milk spectral data including Savitzky-Golay smoothing, standard normal variate (SNV), multiplicative scatter correction, and first derivative. SNV was chosen for noise removal according its performance. Then, XGBoost-based forward feature selection (XGBFFS) and further optimized by genetic algorithm (GA) was used to selection variables. For XGBFFS, variable importance values were computed by XGBoost method and variables were selected by forward feature selection. And then GA was employed to further optimize and reduce variable space. The XGBFFS-GA method was applied to quality evaluation of raw milk and compared to traditional variable selections, including ReliefF, uninformative variable elimination, and competitive adaptive reweighted sampling. Integrated models were built by support Vector Machine (SVM) and XGBoost for different variable selection methods. The results indicated that variable selection methods based on XGBoost effectively reduce variable space and the XGBFFS-GA demonstrated the best performance for quality evaluation of raw milk. Finally, the XGBFFS-GA-SVM model achieved the best results, with prediction set accuracy of 94.84% and F1 score of 94.21%. This study introduces a new idea for variable selection in NIR spectroscopy analysis and a rapid integrated grade classification model for raw milk quality evaluation.
原料奶质量的综合等级分类模型
原料奶的质量对奶牛养殖业和乳制品行业都至关重要。本研究提出了一种基于脂肪含量、蛋白质含量和体细胞计数的原料奶质量综合等级分类模型。采用近红外(NIR)技术建立快速分类模型。针对原料奶质量等级与光谱变量之间复杂的非线性关系难以建模的问题,提出了一种基于极限梯度增强(XGBoost)的混合变量选择方法。共收集了617份原料奶样品,并将其分为三个质量等级。首先,对原料牛奶光谱数据进行Savitzky-Golay平滑、标准正态变量(SNV)、乘性散点校正和一阶导数等预处理。根据SNV的性能选择SNV进行降噪。然后,利用基于xgboost的前向特征选择(XGBFFS),并进一步通过遗传算法(GA)进行优化,对变量进行选择。对于XGBFFS,采用XGBoost方法计算变量重要值,采用前向特征选择方法选择变量。然后利用遗传算法进一步优化并减小变量空间。将XGBFFS-GA方法应用于原料奶的质量评价,并与传统的变量选择方法进行了比较,包括ReliefF、无信息变量消除和竞争性自适应重加权抽样。针对不同的变量选择方法,利用支持向量机(SVM)和XGBoost建立了集成模型。结果表明,基于XGBoost的变量选择方法有效地减小了变量空间,XGBFFS-GA在原料奶质量评价中表现出最好的性能。最后,XGBFFS-GA-SVM模型取得了最好的预测效果,预测集准确率为94.84%,F1得分为94.21%。本研究提出了近红外光谱分析中变量选择的新思路和原料奶品质评价的快速综合等级分类模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers and Electronics in Agriculture
Computers and Electronics in Agriculture 工程技术-计算机:跨学科应用
CiteScore
15.30
自引率
14.50%
发文量
800
审稿时长
62 days
期刊介绍: Computers and Electronics in Agriculture provides international coverage of advancements in computer hardware, software, electronic instrumentation, and control systems applied to agricultural challenges. Encompassing agronomy, horticulture, forestry, aquaculture, and animal farming, the journal publishes original papers, reviews, and applications notes. It explores the use of computers and electronics in plant or animal agricultural production, covering topics like agricultural soils, water, pests, controlled environments, and waste. The scope extends to on-farm post-harvest operations and relevant technologies, including artificial intelligence, sensors, machine vision, robotics, networking, and simulation modeling. Its companion journal, Smart Agricultural Technology, continues the focus on smart applications in production agriculture.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信