Wenqi Lou, Luiz F. Brito, Xiuxin Zhao, Valentina Bonfatti, Jianbin Li, Yachun Wang
{"title":"Selection of the most informative wavenumbers to improve prediction accuracy of milk fatty acid profile based on milk mid-infrared spectra data","authors":"Wenqi Lou, Luiz F. Brito, Xiuxin Zhao, Valentina Bonfatti, Jianbin Li, Yachun Wang","doi":"10.1002/aro2.72","DOIUrl":null,"url":null,"abstract":"<p>Milk mid-infrared (MIR) spectra have been shown to provide valuable information on a wide range of traits to be used in dairy cattle breeding programs. Selecting the most informative variables from complex data can improve the prediction accuracy and model robustness and, consequently, the interpretability of MIR spectra. Thus, we aimed to investigate the prediction performance of feature selection methods based on MIR spectra data, using the milk fatty acid (FA) profile as an example to illustrate the evaluated procedure. Data of MIR spectra, milk test-day records, and reference FA concentrations of 155 first-parity Holstein cows were used in the analyses. Four models comprising different explanatory variables and three feature selection methods were evaluated. The results indicated that competitive adaptive reweighted sampling (CARS) method can effectively select the most informative variables from the MIR spectra, resulting in higher prediction accuracies than other variable selection approaches. The model including selected MIR spectra and cow information variables yielded the best FA profile predictions based on partial least square regression. C8:0, C10:0, C14:1, C17:0 isomers, C18:1, C18:1 isomer, medium-chain FA, unsaturation FA, monounsaturated FA, and polyunsaturated FA presented accuracies based on the determination coefficient ranging from 0.66 to 0.85 in internal validation and from 0.65 to 0.84 in external validation. The most related wavenumbers to 35 FAs were found within 1003 to 1145 cm<sup>−1</sup>. Generally, using CARS and cow information improved predictions of FAs based on MIR spectra in Chinese Holstein dairy cows. Additional validation studies should be conducted as larger datasets become available.</p>","PeriodicalId":100086,"journal":{"name":"Animal Research and One Health","volume":"2 4","pages":"417-430"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aro2.72","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Animal Research and One Health","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/aro2.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Milk mid-infrared (MIR) spectra have been shown to provide valuable information on a wide range of traits to be used in dairy cattle breeding programs. Selecting the most informative variables from complex data can improve the prediction accuracy and model robustness and, consequently, the interpretability of MIR spectra. Thus, we aimed to investigate the prediction performance of feature selection methods based on MIR spectra data, using the milk fatty acid (FA) profile as an example to illustrate the evaluated procedure. Data of MIR spectra, milk test-day records, and reference FA concentrations of 155 first-parity Holstein cows were used in the analyses. Four models comprising different explanatory variables and three feature selection methods were evaluated. The results indicated that competitive adaptive reweighted sampling (CARS) method can effectively select the most informative variables from the MIR spectra, resulting in higher prediction accuracies than other variable selection approaches. The model including selected MIR spectra and cow information variables yielded the best FA profile predictions based on partial least square regression. C8:0, C10:0, C14:1, C17:0 isomers, C18:1, C18:1 isomer, medium-chain FA, unsaturation FA, monounsaturated FA, and polyunsaturated FA presented accuracies based on the determination coefficient ranging from 0.66 to 0.85 in internal validation and from 0.65 to 0.84 in external validation. The most related wavenumbers to 35 FAs were found within 1003 to 1145 cm−1. Generally, using CARS and cow information improved predictions of FAs based on MIR spectra in Chinese Holstein dairy cows. Additional validation studies should be conducted as larger datasets become available.