基于近红外光谱的 Taraxacum kok-saghyz Rodin 干根橡胶含量预测模型

IF 4.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Runfeng Chen, Qingqing Yan, Tuhanguli Tuoheti, Lin Xu, Qiang Gao, Yan Zhang, Hailong Ren, Lipeng Zheng, Feng Wang, Ya Liu
{"title":"基于近红外光谱的 Taraxacum kok-saghyz Rodin 干根橡胶含量预测模型","authors":"Runfeng Chen, Qingqing Yan, Tuhanguli Tuoheti, Lin Xu, Qiang Gao, Yan Zhang, Hailong Ren, Lipeng Zheng, Feng Wang, Ya Liu","doi":"10.1186/s13007-024-01183-6","DOIUrl":null,"url":null,"abstract":"Taraxacum kok-saghyz Rodin (TKS) is a highly potential source of natural rubber (NR) due to its wide range of suitable planting areas, strong adaptability, and suitability for mechanized planting and harvesting. However, current methods for detecting NR content are relatively cumbersome, necessitating the development of a rapid detection model. This study used near-infrared spectroscopy technology to establish a rapid detection model for NR content in TKS root segments and powder samples. The K445 strain at different growth stages within a year and 129 TKS samples hybridized with dandelion were used to obtain their near-infrared spectral data. The rubber content in the root of the samples was detected using the alkaline boiling method. The Monte Carlo sampling method (MCS) was used to filter abnormal data from the root segments of TKS and powder samples, respectively. The SPXY algorithm was used to divide the training set and validation set in a 3:1 ratio. The original spectrum was preprocessed using moving window smoothing (MWS), standard normalized variate (SNV), multiplicative scatter correction (MSC), and first derivative (FD) algorithms. The competitive adaptive reweighted sampling (CARS) algorithm and the corresponding chemical characteristic bands of NR were used to screen the bands. Partial least squares (PLS), random forest (RF), Lightweight gradient augmentation machine (LightGBM), and convolutional neural network (CNN) algorithms were employed to establish a model using the optimal spectral processing method for three different bands: full band, CARS algorithm, and chemical characteristic bands corresponding to NR. The model with the best predictive performance for high rubber content intervals (rubber content > 15%) was identified. The results indicated that the optimal rubber content prediction models for TKS root segments and powder samples were MWS–FD CASR–RF and MWS–FD chemical characteristic band RF, respectively. Their respective $${\\text{R}}_{{\\text{P}}}^{2}$$ , RMSEP, and RPDP values were 0.951, 0.979, 1.814, 1.133, 4.498, and 6.845. In the high rubber content range, the model based on the LightGBM algorithm had the best prediction performance, with the RMSEP of the root segments and powder samples being 0.752 and 0.918, respectively. This research indicates that dried TKS root powder samples are more appropriate for constructing a rubber content prediction model than segmented samples, and the predictive capability of root powder samples is superior to that of root segmented samples. Especially in the elevated rubber content range, the model formulated using the LightGBM algorithm has superior predictive performance, which could offer a theoretical basis for the rapid detection technology of TKS content in the future.","PeriodicalId":20100,"journal":{"name":"Plant Methods","volume":"42 1","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A prediction model of rubber content in the dried root of Taraxacum kok-saghyz Rodin based on near-infrared spectroscopy\",\"authors\":\"Runfeng Chen, Qingqing Yan, Tuhanguli Tuoheti, Lin Xu, Qiang Gao, Yan Zhang, Hailong Ren, Lipeng Zheng, Feng Wang, Ya Liu\",\"doi\":\"10.1186/s13007-024-01183-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Taraxacum kok-saghyz Rodin (TKS) is a highly potential source of natural rubber (NR) due to its wide range of suitable planting areas, strong adaptability, and suitability for mechanized planting and harvesting. However, current methods for detecting NR content are relatively cumbersome, necessitating the development of a rapid detection model. This study used near-infrared spectroscopy technology to establish a rapid detection model for NR content in TKS root segments and powder samples. The K445 strain at different growth stages within a year and 129 TKS samples hybridized with dandelion were used to obtain their near-infrared spectral data. The rubber content in the root of the samples was detected using the alkaline boiling method. The Monte Carlo sampling method (MCS) was used to filter abnormal data from the root segments of TKS and powder samples, respectively. The SPXY algorithm was used to divide the training set and validation set in a 3:1 ratio. The original spectrum was preprocessed using moving window smoothing (MWS), standard normalized variate (SNV), multiplicative scatter correction (MSC), and first derivative (FD) algorithms. The competitive adaptive reweighted sampling (CARS) algorithm and the corresponding chemical characteristic bands of NR were used to screen the bands. Partial least squares (PLS), random forest (RF), Lightweight gradient augmentation machine (LightGBM), and convolutional neural network (CNN) algorithms were employed to establish a model using the optimal spectral processing method for three different bands: full band, CARS algorithm, and chemical characteristic bands corresponding to NR. The model with the best predictive performance for high rubber content intervals (rubber content > 15%) was identified. The results indicated that the optimal rubber content prediction models for TKS root segments and powder samples were MWS–FD CASR–RF and MWS–FD chemical characteristic band RF, respectively. Their respective $${\\\\text{R}}_{{\\\\text{P}}}^{2}$$ , RMSEP, and RPDP values were 0.951, 0.979, 1.814, 1.133, 4.498, and 6.845. In the high rubber content range, the model based on the LightGBM algorithm had the best prediction performance, with the RMSEP of the root segments and powder samples being 0.752 and 0.918, respectively. This research indicates that dried TKS root powder samples are more appropriate for constructing a rubber content prediction model than segmented samples, and the predictive capability of root powder samples is superior to that of root segmented samples. Especially in the elevated rubber content range, the model formulated using the LightGBM algorithm has superior predictive performance, which could offer a theoretical basis for the rapid detection technology of TKS content in the future.\",\"PeriodicalId\":20100,\"journal\":{\"name\":\"Plant Methods\",\"volume\":\"42 1\",\"pages\":\"\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2024-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Plant Methods\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13007-024-01183-6\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13007-024-01183-6","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

蒲公英(Taraxacum kok-saghyz Rodin,TKS)适宜种植的地区范围广,适应性强,适合机械化种植和收割,是一种极具潜力的天然橡胶(NR)来源。然而,目前检测 NR 含量的方法相对繁琐,因此有必要开发一种快速检测模型。本研究利用近红外光谱技术建立了一个快速检测 TKS 根部和粉末样品中 NR 含量的模型。研究采用一年内不同生长阶段的 K445 株和 129 个与蒲公英杂交的 TKS 样品,获得了它们的近红外光谱数据。样品根部的橡胶含量采用碱煮沸法检测。Monte Carlo 抽样法(MCS)分别用于过滤 TKS 和粉末样品根部的异常数据。使用 SPXY 算法按 3:1 的比例划分训练集和验证集。使用移动窗平滑(MWS)、标准归一化变量(SNV)、乘法散度校正(MSC)和一阶导数(FD)算法对原始光谱进行预处理。竞争性自适应加权采样(CARS)算法和 NR 的相应化学特征带用于筛选带。采用偏最小二乘法(PLS)、随机森林(RF)、轻量级梯度增强机(LightGBM)和卷积神经网络(CNN)算法,针对全波段、CARS 算法和 NR 对应的化学特征波段这三种不同波段,使用最佳光谱处理方法建立模型。确定了对高橡胶含量区间(橡胶含量大于 15%)具有最佳预测性能的模型。结果表明,TKS 根段和粉末样品的最佳橡胶含量预测模型分别是 MWS-FD CASR-RF 和 MWS-FD 化学特征带 RF。它们各自的 $${text{R}}_{text/{P}}^{2}$、RMSEP 和 RPDP 值分别为 0.951、0.979、1.814、1.133、4.498 和 6.845。在高橡胶含量范围内,基于 LightGBM 算法的模型具有最佳预测性能,根段和粉末样品的 RMSEP 分别为 0.752 和 0.918。这项研究表明,与分段样品相比,干燥的 TKS 根粉末样品更适合用于构建橡胶含量预测模型,根粉末样品的预测能力优于根分段样品。特别是在橡胶含量较高的范围内,采用 LightGBM 算法建立的模型具有更优越的预测性能,为今后 TKS 含量的快速检测技术提供了理论依据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A prediction model of rubber content in the dried root of Taraxacum kok-saghyz Rodin based on near-infrared spectroscopy
Taraxacum kok-saghyz Rodin (TKS) is a highly potential source of natural rubber (NR) due to its wide range of suitable planting areas, strong adaptability, and suitability for mechanized planting and harvesting. However, current methods for detecting NR content are relatively cumbersome, necessitating the development of a rapid detection model. This study used near-infrared spectroscopy technology to establish a rapid detection model for NR content in TKS root segments and powder samples. The K445 strain at different growth stages within a year and 129 TKS samples hybridized with dandelion were used to obtain their near-infrared spectral data. The rubber content in the root of the samples was detected using the alkaline boiling method. The Monte Carlo sampling method (MCS) was used to filter abnormal data from the root segments of TKS and powder samples, respectively. The SPXY algorithm was used to divide the training set and validation set in a 3:1 ratio. The original spectrum was preprocessed using moving window smoothing (MWS), standard normalized variate (SNV), multiplicative scatter correction (MSC), and first derivative (FD) algorithms. The competitive adaptive reweighted sampling (CARS) algorithm and the corresponding chemical characteristic bands of NR were used to screen the bands. Partial least squares (PLS), random forest (RF), Lightweight gradient augmentation machine (LightGBM), and convolutional neural network (CNN) algorithms were employed to establish a model using the optimal spectral processing method for three different bands: full band, CARS algorithm, and chemical characteristic bands corresponding to NR. The model with the best predictive performance for high rubber content intervals (rubber content > 15%) was identified. The results indicated that the optimal rubber content prediction models for TKS root segments and powder samples were MWS–FD CASR–RF and MWS–FD chemical characteristic band RF, respectively. Their respective $${\text{R}}_{{\text{P}}}^{2}$$ , RMSEP, and RPDP values were 0.951, 0.979, 1.814, 1.133, 4.498, and 6.845. In the high rubber content range, the model based on the LightGBM algorithm had the best prediction performance, with the RMSEP of the root segments and powder samples being 0.752 and 0.918, respectively. This research indicates that dried TKS root powder samples are more appropriate for constructing a rubber content prediction model than segmented samples, and the predictive capability of root powder samples is superior to that of root segmented samples. Especially in the elevated rubber content range, the model formulated using the LightGBM algorithm has superior predictive performance, which could offer a theoretical basis for the rapid detection technology of TKS content in the future.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Plant Methods
Plant Methods 生物-植物科学
CiteScore
9.20
自引率
3.90%
发文量
121
审稿时长
2 months
期刊介绍: Plant Methods is an open access, peer-reviewed, online journal for the plant research community that encompasses all aspects of technological innovation in the plant sciences. There is no doubt that we have entered an exciting new era in plant biology. The completion of the Arabidopsis genome sequence, and the rapid progress being made in other plant genomics projects are providing unparalleled opportunities for progress in all areas of plant science. Nevertheless, enormous challenges lie ahead if we are to understand the function of every gene in the genome, and how the individual parts work together to make the whole organism. Achieving these goals will require an unprecedented collaborative effort, combining high-throughput, system-wide technologies with more focused approaches that integrate traditional disciplines such as cell biology, biochemistry and molecular genetics. Technological innovation is probably the most important catalyst for progress in any scientific discipline. Plant Methods’ goal is to stimulate the development and adoption of new and improved techniques and research tools and, where appropriate, to promote consistency of methodologies for better integration of data from different laboratories.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信