Categorizing Chinese Wolfberry of Different Geographical Origins by Terahertz Spectroscopy and Machine Learning Algorithms

IF 2.8 Q2 FOOD SCIENCE & TECHNOLOGY
Ningning Yu, Chengqian You, Chunyi Zhang, Wanru Chi, Xu Yang, Min Yuan, Qiuhong Qu, Pengfei Wang* and Mingxia He, 
{"title":"Categorizing Chinese Wolfberry of Different Geographical Origins by Terahertz Spectroscopy and Machine Learning Algorithms","authors":"Ningning Yu,&nbsp;Chengqian You,&nbsp;Chunyi Zhang,&nbsp;Wanru Chi,&nbsp;Xu Yang,&nbsp;Min Yuan,&nbsp;Qiuhong Qu,&nbsp;Pengfei Wang* and Mingxia He,&nbsp;","doi":"10.1021/acsfoodscitech.5c00390","DOIUrl":null,"url":null,"abstract":"<p >The quality of Chinese wolfberry is closely associated with its geographical origin, rendering precise categorization significant for efficacy. However, the morphological similarity of wolfberries from different origins adds difficulty. To address this issue, a combination of terahertz (THz) spectroscopy and machine learning algorithms is proposed. Chinese wolfberries from the Ningxia, Gansu, and Xinjiang Provinces were selected and examined by THz spectroscopy. Four preprocessing methods were applied to the THz absorption spectra. Among them, the classification accuracy of Savitzky-Golay and moving average was comparable to that of raw data. Min-max normalization was considered to be the most effective method, and the prediction accuracy is higher than 90%. Principal component analysis (PCA) was conducted to reduce the data dimensionality. Four machine learning algorithms, including least-squares support vector machine (LSSVM), random forest (RF), partial least-squares discriminant analysis (PLS-DA), and extreme learning machine (ELM), were further implemented to achieve high-accuracy classification. By using the PLS-DA model based on raw data, binary classification achieved accuracies of 90.96 and 97.61% in distinguishing Ningxia wolfberries from Gansu and Xinjiang origins, respectively. With respect to ternary classification, the optimal combination of LSSVM with min-max normalization preprocessing achieved 99.17% accuracy and a 0.990 Kappa coefficient, significantly outperforming other combinations. This approach provides a high-precision solution for geographical origin classification, which demonstrates the feasibility for ensuring the pharmaceutical efficacy of traditional Chinese medicine by THz spectroscopy.</p>","PeriodicalId":72048,"journal":{"name":"ACS food science & technology","volume":"5 9","pages":"3353–3360"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS food science & technology","FirstCategoryId":"1085","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acsfoodscitech.5c00390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The quality of Chinese wolfberry is closely associated with its geographical origin, rendering precise categorization significant for efficacy. However, the morphological similarity of wolfberries from different origins adds difficulty. To address this issue, a combination of terahertz (THz) spectroscopy and machine learning algorithms is proposed. Chinese wolfberries from the Ningxia, Gansu, and Xinjiang Provinces were selected and examined by THz spectroscopy. Four preprocessing methods were applied to the THz absorption spectra. Among them, the classification accuracy of Savitzky-Golay and moving average was comparable to that of raw data. Min-max normalization was considered to be the most effective method, and the prediction accuracy is higher than 90%. Principal component analysis (PCA) was conducted to reduce the data dimensionality. Four machine learning algorithms, including least-squares support vector machine (LSSVM), random forest (RF), partial least-squares discriminant analysis (PLS-DA), and extreme learning machine (ELM), were further implemented to achieve high-accuracy classification. By using the PLS-DA model based on raw data, binary classification achieved accuracies of 90.96 and 97.61% in distinguishing Ningxia wolfberries from Gansu and Xinjiang origins, respectively. With respect to ternary classification, the optimal combination of LSSVM with min-max normalization preprocessing achieved 99.17% accuracy and a 0.990 Kappa coefficient, significantly outperforming other combinations. This approach provides a high-precision solution for geographical origin classification, which demonstrates the feasibility for ensuring the pharmaceutical efficacy of traditional Chinese medicine by THz spectroscopy.

Abstract Image

利用太赫兹光谱和机器学习算法对不同产地枸杞进行分类
枸杞的品质与其产地密切相关,因此精确的分类对功效具有重要意义。然而,不同产地枸杞的形态相似性增加了难度。为了解决这个问题,提出了太赫兹(THz)光谱学和机器学习算法的结合。选用宁夏、甘肃和新疆三省的枸杞进行太赫兹光谱分析。采用四种预处理方法对太赫兹吸收光谱进行了处理。其中,Savitzky-Golay和移动平均的分类精度与原始数据相当。最小-最大归一化是最有效的方法,预测精度高于90%。采用主成分分析(PCA)对数据进行降维处理。进一步实现了最小二乘支持向量机(LSSVM)、随机森林(RF)、偏最小二乘判别分析(PLS-DA)和极限学习机(ELM) 4种机器学习算法,实现了高精度分类。采用基于原始数据的PLS-DA模型,对宁夏枸杞与甘肃枸杞和新疆枸杞的二元分类准确率分别为90.96和97.61%。在三元分类方面,LSSVM与min-max归一化预处理的最优组合准确率为99.17%,Kappa系数为0.990,显著优于其他组合。该方法为产地分类提供了高精度的解决方案,证明了利用太赫兹光谱技术保证中药药效的可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.30
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信