{"title":"An Improved Ensemble Learning Method for Protein Content Analysis of Corn with Small Sample by Near-Infrared Spectroscopy","authors":"Jing Liu, Shaohui Yu","doi":"10.1007/s12161-024-02669-8","DOIUrl":null,"url":null,"abstract":"<div><p>Near-infrared spectroscopy has become an important methodology for rapid and non-destructive detection in food and agricultural fields. However, the accuracy of quantitative analysis was seriously restricted by the severe overlap of spectra and the high cost of standard samples. In order to reduce the impact of these problems especially that of small sample size problem, a novel method named weighted clustering ensemble partial least squares (WCE-PLS) is proposed for the protein content analysis of corn. Firstly, the clustering and sampling strategy is introduced in the calibration sets of corn to create different subsets for generating sub-models. Then, root mean square errors of cross-validation (RMSECV) in those sub-models as the crucial criterion are computed for model optimization. Finally, in integrating step, two Gaussian weighted functions used to determine the weights of sub-models are defined. The validation performance of the proposed method is tested with the near infrared spectral data sets of corn and compared with single PLS, bagging PLS, boosting PLS, and data augmentation (DA) PLS. To further demonstrate the effectiveness of the method, another data set of soil was used for supplementary verification. Results of the prediction sets indicated that the RMSEP values of the WCE-PLS are obviously smaller than that of boosting PLS. And the RMSEP of WCE-PLS and bagging PLS is relatively small in most cases. Furthermore, the correlation coefficients between predicted value and chemical value are respectively 0.96587 and 0.90849 for two data sets, which computed by the WCE-PLS is obviously higher than that computed by the other four methods. And the <i>t</i> test also showed the WCE-PLS has smaller <i>t</i> values and larger <i>p</i> values.</p></div>","PeriodicalId":561,"journal":{"name":"Food Analytical Methods","volume":"17 9","pages":"1383 - 1392"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Food Analytical Methods","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s12161-024-02669-8","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Near-infrared spectroscopy has become an important methodology for rapid and non-destructive detection in food and agricultural fields. However, the accuracy of quantitative analysis was seriously restricted by the severe overlap of spectra and the high cost of standard samples. In order to reduce the impact of these problems especially that of small sample size problem, a novel method named weighted clustering ensemble partial least squares (WCE-PLS) is proposed for the protein content analysis of corn. Firstly, the clustering and sampling strategy is introduced in the calibration sets of corn to create different subsets for generating sub-models. Then, root mean square errors of cross-validation (RMSECV) in those sub-models as the crucial criterion are computed for model optimization. Finally, in integrating step, two Gaussian weighted functions used to determine the weights of sub-models are defined. The validation performance of the proposed method is tested with the near infrared spectral data sets of corn and compared with single PLS, bagging PLS, boosting PLS, and data augmentation (DA) PLS. To further demonstrate the effectiveness of the method, another data set of soil was used for supplementary verification. Results of the prediction sets indicated that the RMSEP values of the WCE-PLS are obviously smaller than that of boosting PLS. And the RMSEP of WCE-PLS and bagging PLS is relatively small in most cases. Furthermore, the correlation coefficients between predicted value and chemical value are respectively 0.96587 and 0.90849 for two data sets, which computed by the WCE-PLS is obviously higher than that computed by the other four methods. And the t test also showed the WCE-PLS has smaller t values and larger p values.
期刊介绍:
Food Analytical Methods publishes original articles, review articles, and notes on novel and/or state-of-the-art analytical methods or issues to be solved, as well as significant improvements or interesting applications to existing methods. These include analytical technology and methodology for food microbial contaminants, food chemistry and toxicology, food quality, food authenticity and food traceability. The journal covers fundamental and specific aspects of the development, optimization, and practical implementation in routine laboratories, and validation of food analytical methods for the monitoring of food safety and quality.