Ningning Yu, Chengqian You, Chunyi Zhang, Wanru Chi, Xu Yang, Min Yuan, Qiuhong Qu, Pengfei Wang* and Mingxia He,
{"title":"Categorizing Chinese Wolfberry of Different Geographical Origins by Terahertz Spectroscopy and Machine Learning Algorithms","authors":"Ningning Yu, Chengqian You, Chunyi Zhang, Wanru Chi, Xu Yang, Min Yuan, Qiuhong Qu, Pengfei Wang* and Mingxia He, ","doi":"10.1021/acsfoodscitech.5c00390","DOIUrl":null,"url":null,"abstract":"<p >The quality of Chinese wolfberry is closely associated with its geographical origin, rendering precise categorization significant for efficacy. However, the morphological similarity of wolfberries from different origins adds difficulty. To address this issue, a combination of terahertz (THz) spectroscopy and machine learning algorithms is proposed. Chinese wolfberries from the Ningxia, Gansu, and Xinjiang Provinces were selected and examined by THz spectroscopy. Four preprocessing methods were applied to the THz absorption spectra. Among them, the classification accuracy of Savitzky-Golay and moving average was comparable to that of raw data. Min-max normalization was considered to be the most effective method, and the prediction accuracy is higher than 90%. Principal component analysis (PCA) was conducted to reduce the data dimensionality. Four machine learning algorithms, including least-squares support vector machine (LSSVM), random forest (RF), partial least-squares discriminant analysis (PLS-DA), and extreme learning machine (ELM), were further implemented to achieve high-accuracy classification. By using the PLS-DA model based on raw data, binary classification achieved accuracies of 90.96 and 97.61% in distinguishing Ningxia wolfberries from Gansu and Xinjiang origins, respectively. With respect to ternary classification, the optimal combination of LSSVM with min-max normalization preprocessing achieved 99.17% accuracy and a 0.990 Kappa coefficient, significantly outperforming other combinations. This approach provides a high-precision solution for geographical origin classification, which demonstrates the feasibility for ensuring the pharmaceutical efficacy of traditional Chinese medicine by THz spectroscopy.</p>","PeriodicalId":72048,"journal":{"name":"ACS food science & technology","volume":"5 9","pages":"3353–3360"},"PeriodicalIF":2.8000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS food science & technology","FirstCategoryId":"1085","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acsfoodscitech.5c00390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
The quality of Chinese wolfberry is closely associated with its geographical origin, rendering precise categorization significant for efficacy. However, the morphological similarity of wolfberries from different origins adds difficulty. To address this issue, a combination of terahertz (THz) spectroscopy and machine learning algorithms is proposed. Chinese wolfberries from the Ningxia, Gansu, and Xinjiang Provinces were selected and examined by THz spectroscopy. Four preprocessing methods were applied to the THz absorption spectra. Among them, the classification accuracy of Savitzky-Golay and moving average was comparable to that of raw data. Min-max normalization was considered to be the most effective method, and the prediction accuracy is higher than 90%. Principal component analysis (PCA) was conducted to reduce the data dimensionality. Four machine learning algorithms, including least-squares support vector machine (LSSVM), random forest (RF), partial least-squares discriminant analysis (PLS-DA), and extreme learning machine (ELM), were further implemented to achieve high-accuracy classification. By using the PLS-DA model based on raw data, binary classification achieved accuracies of 90.96 and 97.61% in distinguishing Ningxia wolfberries from Gansu and Xinjiang origins, respectively. With respect to ternary classification, the optimal combination of LSSVM with min-max normalization preprocessing achieved 99.17% accuracy and a 0.990 Kappa coefficient, significantly outperforming other combinations. This approach provides a high-precision solution for geographical origin classification, which demonstrates the feasibility for ensuring the pharmaceutical efficacy of traditional Chinese medicine by THz spectroscopy.