{"title":"Identification of geographical origins of Gastrodia elata Blume based on multisource data fusion.","authors":"Hong Liu, Honggao Liu, Jieqing Li, Yuanzhong Wang","doi":"10.1002/pca.3413","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Identifying the geographical origin of Gastrodia elata Blume contributes to the scientific and rational utilization of medicinal materials. In this study, infrared spectroscopy was combined with machine learning algorithms to distinguish the origin of G. elata BI.</p><p><strong>Objective: </strong>Realization of rapid and accurate identification of the origin of G. elata BI.</p><p><strong>Materials and methods: </strong>Attenuated total reflection Fourier transform infrared (ATR-FTIR) spectra and Fourier transform near-infrared (FT-NIR) spectra were collected for 306 samples of G. elata BI.</p><p><strong>Samples: </strong>Firstly, a support vector machine (SVM) model was established based on the single-spectrum and the full-spectrum fusion data. To investigate whether feature-level fusion strategy can enhance the model's performance, the sequential and orthogonalized partial least squares discriminant analysis (SO-PLS-DA) model was established to extract and combine two types of spectral features. Next, six algorithms were employed to extract feature variables, SVM model was established based on the feature-level fusion data. To avoid complicated preprocessing and feature extraction processes, a residual convolutional neural network (ResNet) model was established after converting the raw spectral data into spectral images.</p><p><strong>Results: </strong>The accuracy of the feature-level fusion model is better as compared to the single-spectrum model and the fusion model with full-spectrum, and SO-PLS-DA is simpler than feature-level fusion based on the SVM model. The ResNet model performs well in classification but requires more data to enhance its generalization capability and training effectiveness.</p><p><strong>Conclusion: </strong>Sequential and orthogonalized data fusion approaches and ResNet models are powerful solutions for identifying the geographic origin of G. elata BI.</p>","PeriodicalId":20095,"journal":{"name":"Phytochemical Analysis","volume":" ","pages":"1704-1716"},"PeriodicalIF":3.0000,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Phytochemical Analysis","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pca.3413","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/27 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Identifying the geographical origin of Gastrodia elata Blume contributes to the scientific and rational utilization of medicinal materials. In this study, infrared spectroscopy was combined with machine learning algorithms to distinguish the origin of G. elata BI.
Objective: Realization of rapid and accurate identification of the origin of G. elata BI.
Materials and methods: Attenuated total reflection Fourier transform infrared (ATR-FTIR) spectra and Fourier transform near-infrared (FT-NIR) spectra were collected for 306 samples of G. elata BI.
Samples: Firstly, a support vector machine (SVM) model was established based on the single-spectrum and the full-spectrum fusion data. To investigate whether feature-level fusion strategy can enhance the model's performance, the sequential and orthogonalized partial least squares discriminant analysis (SO-PLS-DA) model was established to extract and combine two types of spectral features. Next, six algorithms were employed to extract feature variables, SVM model was established based on the feature-level fusion data. To avoid complicated preprocessing and feature extraction processes, a residual convolutional neural network (ResNet) model was established after converting the raw spectral data into spectral images.
Results: The accuracy of the feature-level fusion model is better as compared to the single-spectrum model and the fusion model with full-spectrum, and SO-PLS-DA is simpler than feature-level fusion based on the SVM model. The ResNet model performs well in classification but requires more data to enhance its generalization capability and training effectiveness.
Conclusion: Sequential and orthogonalized data fusion approaches and ResNet models are powerful solutions for identifying the geographic origin of G. elata BI.
导言:确定 Gastrodia elata Blume 的地理产地有助于科学合理地利用药材。本研究将红外光谱法与机器学习算法相结合,以区分 G. elata BI 的产地:材料与方法:收集了 306 个 G. elata BI 样品的衰减全反射傅立叶变换红外光谱(ATR-FTIR)和傅立叶变换近红外光谱(FT-NIR):首先,基于单光谱和全光谱融合数据建立支持向量机(SVM)模型。为了研究特征级融合策略是否能提高模型的性能,建立了序列和正交化偏最小二乘判别分析(SO-PLS-DA)模型来提取和组合两种光谱特征。接着,采用六种算法提取特征变量,并根据特征级融合数据建立 SVM 模型。为了避免复杂的预处理和特征提取过程,在将原始光谱数据转换为光谱图像后,建立了残差卷积神经网络(ResNet)模型:结果:与单光谱模型和全光谱融合模型相比,特征级融合模型的准确度更高,SO-PLS-DA 比基于 SVM 模型的特征级融合更简单。ResNet 模型在分类中表现良好,但需要更多的数据来增强其泛化能力和训练效果:结论:序列和正交化数据融合方法以及 ResNet 模型是识别 G. elata BI 地理起源的有力解决方案。
期刊介绍:
Phytochemical Analysis is devoted to the publication of original articles concerning the development, improvement, validation and/or extension of application of analytical methodology in the plant sciences. The spectrum of coverage is broad, encompassing methods and techniques relevant to the detection (including bio-screening), extraction, separation, purification, identification and quantification of compounds in plant biochemistry, plant cellular and molecular biology, plant biotechnology, the food sciences, agriculture and horticulture. The Journal publishes papers describing significant novelty in the analysis of whole plants (including algae), plant cells, tissues and organs, plant-derived extracts and plant products (including those which have been partially or completely refined for use in the food, agrochemical, pharmaceutical and related industries). All forms of physical, chemical, biochemical, spectroscopic, radiometric, electrometric, chromatographic, metabolomic and chemometric investigations of plant products (monomeric species as well as polymeric molecules such as nucleic acids, proteins, lipids and carbohydrates) are included within the remit of the Journal. Papers dealing with novel methods relating to areas such as data handling/ data mining in plant sciences will also be welcomed.