An Improved Ensemble Learning Method for Protein Content Analysis of Corn with Small Sample by Near-Infrared Spectroscopy

IF 2.6 3区 农林科学 Q2 FOOD SCIENCE & TECHNOLOGY
Jing Liu, Shaohui Yu
{"title":"An Improved Ensemble Learning Method for Protein Content Analysis of Corn with Small Sample by Near-Infrared Spectroscopy","authors":"Jing Liu,&nbsp;Shaohui Yu","doi":"10.1007/s12161-024-02669-8","DOIUrl":null,"url":null,"abstract":"<div><p>Near-infrared spectroscopy has become an important methodology for rapid and non-destructive detection in food and agricultural fields. However, the accuracy of quantitative analysis was seriously restricted by the severe overlap of spectra and the high cost of standard samples. In order to reduce the impact of these problems especially that of small sample size problem, a novel method named weighted clustering ensemble partial least squares (WCE-PLS) is proposed for the protein content analysis of corn. Firstly, the clustering and sampling strategy is introduced in the calibration sets of corn to create different subsets for generating sub-models. Then, root mean square errors of cross-validation (RMSECV) in those sub-models as the crucial criterion are computed for model optimization. Finally, in integrating step, two Gaussian weighted functions used to determine the weights of sub-models are defined. The validation performance of the proposed method is tested with the near infrared spectral data sets of corn and compared with single PLS, bagging PLS, boosting PLS, and data augmentation (DA) PLS. To further demonstrate the effectiveness of the method, another data set of soil was used for supplementary verification. Results of the prediction sets indicated that the RMSEP values of the WCE-PLS are obviously smaller than that of boosting PLS. And the RMSEP of WCE-PLS and bagging PLS is relatively small in most cases. Furthermore, the correlation coefficients between predicted value and chemical value are respectively 0.96587 and 0.90849 for two data sets, which computed by the WCE-PLS is obviously higher than that computed by the other four methods. And the <i>t</i> test also showed the WCE-PLS has smaller <i>t</i> values and larger <i>p</i> values.</p></div>","PeriodicalId":561,"journal":{"name":"Food Analytical Methods","volume":"17 9","pages":"1383 - 1392"},"PeriodicalIF":2.6000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Food Analytical Methods","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s12161-024-02669-8","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Near-infrared spectroscopy has become an important methodology for rapid and non-destructive detection in food and agricultural fields. However, the accuracy of quantitative analysis was seriously restricted by the severe overlap of spectra and the high cost of standard samples. In order to reduce the impact of these problems especially that of small sample size problem, a novel method named weighted clustering ensemble partial least squares (WCE-PLS) is proposed for the protein content analysis of corn. Firstly, the clustering and sampling strategy is introduced in the calibration sets of corn to create different subsets for generating sub-models. Then, root mean square errors of cross-validation (RMSECV) in those sub-models as the crucial criterion are computed for model optimization. Finally, in integrating step, two Gaussian weighted functions used to determine the weights of sub-models are defined. The validation performance of the proposed method is tested with the near infrared spectral data sets of corn and compared with single PLS, bagging PLS, boosting PLS, and data augmentation (DA) PLS. To further demonstrate the effectiveness of the method, another data set of soil was used for supplementary verification. Results of the prediction sets indicated that the RMSEP values of the WCE-PLS are obviously smaller than that of boosting PLS. And the RMSEP of WCE-PLS and bagging PLS is relatively small in most cases. Furthermore, the correlation coefficients between predicted value and chemical value are respectively 0.96587 and 0.90849 for two data sets, which computed by the WCE-PLS is obviously higher than that computed by the other four methods. And the t test also showed the WCE-PLS has smaller t values and larger p values.

Abstract Image

Abstract Image

利用近红外光谱分析小样本玉米蛋白质含量的改进型集合学习方法
近红外光谱技术已成为食品和农业领域快速、无损检测的重要方法。然而,光谱的严重重叠和标准样品的高成本严重限制了定量分析的准确性。为了减少这些问题的影响,特别是小样本量问题的影响,提出了一种用于玉米蛋白质含量分析的新型方法,即加权聚类集合偏最小二乘法(WCE-PLS)。首先,在玉米定标集中引入聚类和抽样策略,创建不同的子集以生成子模型。然后,计算这些子模型的交叉验证均方根误差(RMSECV)作为模型优化的关键标准。最后,在整合步骤中,定义了两个用于确定子模型权重的高斯加权函数。利用玉米的近红外光谱数据集测试了所提方法的验证性能,并将其与单一 PLS、bagging PLS、boosting PLS 和数据增强(DA)PLS 进行了比较。为了进一步证明该方法的有效性,还使用了另一个土壤数据集进行补充验证。预测集的结果表明,WCE-PLS 的 RMSEP 值明显小于提升 PLS。而且,在大多数情况下,WCE-PLS 和装袋 PLS 的 RMSEP 值也相对较小。此外,两组数据的预测值与化学值的相关系数分别为 0.96587 和 0.90849,WCE-PLS 计算出的相关系数明显高于其他四种方法计算出的相关系数。而 t 检验也表明 WCE-PLS 的 t 值较小,p 值较大。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Food Analytical Methods
Food Analytical Methods 农林科学-食品科技
CiteScore
6.00
自引率
3.40%
发文量
244
审稿时长
3.1 months
期刊介绍: Food Analytical Methods publishes original articles, review articles, and notes on novel and/or state-of-the-art analytical methods or issues to be solved, as well as significant improvements or interesting applications to existing methods. These include analytical technology and methodology for food microbial contaminants, food chemistry and toxicology, food quality, food authenticity and food traceability. The journal covers fundamental and specific aspects of the development, optimization, and practical implementation in routine laboratories, and validation of food analytical methods for the monitoring of food safety and quality.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信