Selection of the most informative wavenumbers to improve prediction accuracy of milk fatty acid profile based on milk mid‐infrared spectra data

Wenqi Lou, Luiz F. Brito, Xiuxin Zhao, Valentina Bonfatti, Jianbin Li, Yachun Wang
{"title":"Selection of the most informative wavenumbers to improve prediction accuracy of milk fatty acid profile based on milk mid‐infrared spectra data","authors":"Wenqi Lou, Luiz F. Brito, Xiuxin Zhao, Valentina Bonfatti, Jianbin Li, Yachun Wang","doi":"10.1002/aro2.72","DOIUrl":null,"url":null,"abstract":"Milk mid‐infrared (MIR) spectra have been shown to provide valuable information on a wide range of traits to be used in dairy cattle breeding programs. Selecting the most informative variables from complex data can improve the prediction accuracy and model robustness and, consequently, the interpretability of MIR spectra. Thus, we aimed to investigate the prediction performance of feature selection methods based on MIR spectra data, using the milk fatty acid (FA) profile as an example to illustrate the evaluated procedure. Data of MIR spectra, milk test‐day records, and reference FA concentrations of 155 first‐parity Holstein cows were used in the analyses. Four models comprising different explanatory variables and three feature selection methods were evaluated. The results indicated that competitive adaptive reweighted sampling (CARS) method can effectively select the most informative variables from the MIR spectra, resulting in higher prediction accuracies than other variable selection approaches. The model including selected MIR spectra and cow information variables yielded the best FA profile predictions based on partial least square regression. C8:0, C10:0, C14:1, C17:0 isomers, C18:1, C18:1 isomer, medium‐chain FA, unsaturation FA, monounsaturated FA, and polyunsaturated FA presented accuracies based on the determination coefficient ranging from 0.66 to 0.85 in internal validation and from 0.65 to 0.84 in external validation. The most related wavenumbers to 35 FAs were found within 1003 to 1145 cm−1. Generally, using CARS and cow information improved predictions of FAs based on MIR spectra in Chinese Holstein dairy cows. Additional validation studies should be conducted as larger datasets become available.","PeriodicalId":100086,"journal":{"name":"Animal Research and One Health","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Animal Research and One Health","FirstCategoryId":"0","ListUrlMain":"https://doi.org/10.1002/aro2.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Milk mid‐infrared (MIR) spectra have been shown to provide valuable information on a wide range of traits to be used in dairy cattle breeding programs. Selecting the most informative variables from complex data can improve the prediction accuracy and model robustness and, consequently, the interpretability of MIR spectra. Thus, we aimed to investigate the prediction performance of feature selection methods based on MIR spectra data, using the milk fatty acid (FA) profile as an example to illustrate the evaluated procedure. Data of MIR spectra, milk test‐day records, and reference FA concentrations of 155 first‐parity Holstein cows were used in the analyses. Four models comprising different explanatory variables and three feature selection methods were evaluated. The results indicated that competitive adaptive reweighted sampling (CARS) method can effectively select the most informative variables from the MIR spectra, resulting in higher prediction accuracies than other variable selection approaches. The model including selected MIR spectra and cow information variables yielded the best FA profile predictions based on partial least square regression. C8:0, C10:0, C14:1, C17:0 isomers, C18:1, C18:1 isomer, medium‐chain FA, unsaturation FA, monounsaturated FA, and polyunsaturated FA presented accuracies based on the determination coefficient ranging from 0.66 to 0.85 in internal validation and from 0.65 to 0.84 in external validation. The most related wavenumbers to 35 FAs were found within 1003 to 1145 cm−1. Generally, using CARS and cow information improved predictions of FAs based on MIR spectra in Chinese Holstein dairy cows. Additional validation studies should be conducted as larger datasets become available.
根据牛奶中红外光谱数据选择信息量最大的波长,提高牛奶脂肪酸谱预测的准确性
牛奶中红外光谱(MIR)已被证明可为奶牛育种计划提供大量有价值的性状信息。从复杂数据中选择信息量最大的变量可以提高预测精度和模型稳健性,从而提高中红外光谱的可解释性。因此,我们旨在研究基于近红外光谱数据的特征选择方法的预测性能,并以牛奶脂肪酸(FA)曲线为例说明评估程序。分析中使用了 155 头头等荷斯坦奶牛的近红外光谱数据、牛奶测试日记录和参考脂肪酸浓度。评估了由不同解释变量和三种特征选择方法组成的四个模型。结果表明,竞争性自适应加权采样(CARS)方法能有效地从近红外光谱中选择信息量最大的变量,因此预测准确率高于其他变量选择方法。根据偏最小二乘法回归,包含所选近红外光谱和奶牛信息变量的模型可获得最佳的 FA 轮廓预测结果。在内部验证和外部验证中,C8:0、C10:0、C14:1、C17:0 异构体、C18:1、C18:1 异构体、中链脂肪酸、不饱和脂肪酸、单不饱和脂肪酸和多不饱和脂肪酸的确定系数范围分别为 0.66 至 0.85 和 0.65 至 0.84。与 35 种脂肪酸最相关的波长在 1003 至 1145 cm-1 之间。总体而言,使用 CARS 和奶牛信息可提高基于中国荷斯坦奶牛近红外光谱的 FA 预测结果。随着数据集的扩大,还应该进行更多的验证研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信