{"title":"ATR-FTIR Coupled With Machine Learning Provides a Fast Method for Identifying and Distinguishing 55 Varieties of Fruit-Derived Medicinal Materials.","authors":"Wen-Jie Zhao, Ya-Ling An, Chun-Qian Song, Yu-Shi Huang, Li-Jie Zhang, Kang-Nan Liu, Zhen-Wei Li, Xiao-Kang Liu, Dai-di Zhang, De-An Guo","doi":"10.1002/pca.3545","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Fruit-derived medicinal materials (FDMM) are extensively utilized in daily life, yet the market is beset by substantial variety confusion, which undermines consumer rights and well-being. Consequently, accurate identification of these materials is essential for guaranteeing their quality, effectiveness, and safety.</p><p><strong>Objectives: </strong>This study aimed to combine attenuated total reflection-Fourier transform infrared spectroscopy (ATR-FTIR) and machine learning (ML) techniques to differentiate and identify 55 kinds of FDMM.</p><p><strong>Materials and methods: </strong>A total of 861 sample batches were collected, with 721 allocated for model establishment and 140 for independent validation. A PLS-DA model alongside nine machine learning algorithms-namely support vector machine (SVM), tree, K-nearest neighbor (KNN), discriminant, ensemble, support vector machine kernel (SVMK), logistic regression kernel (LRK), naive Bayes (NB), and neural network (NN)-were constructed. Considering both accuracy and computational efficiency, the optimal model was selected and evaluated in terms of its accuracy, precision, recall, and F1-score. The optimal model was further validated using 140 newly collected samples to ensure its long-term stability after several months.</p><p><strong>Results: </strong>Among the 10 classification models, the KNN model showed exceptional classification capability, with all evaluation metric exceeding 0.98. The KNN model was validated by the new 140 samples with a prediction accuracy of 85.7%, confirming its capability in distinguishing most FDMM.</p><p><strong>Conclusion: </strong>The application of ATR-FTIR technology combined with the robust classification capabilities of ML models enabled rapid and accurate differentiation and identification of 55 FDMM, which contributed to ensuring their quality.</p>","PeriodicalId":20095,"journal":{"name":"Phytochemical Analysis","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Phytochemical Analysis","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pca.3545","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Fruit-derived medicinal materials (FDMM) are extensively utilized in daily life, yet the market is beset by substantial variety confusion, which undermines consumer rights and well-being. Consequently, accurate identification of these materials is essential for guaranteeing their quality, effectiveness, and safety.
Objectives: This study aimed to combine attenuated total reflection-Fourier transform infrared spectroscopy (ATR-FTIR) and machine learning (ML) techniques to differentiate and identify 55 kinds of FDMM.
Materials and methods: A total of 861 sample batches were collected, with 721 allocated for model establishment and 140 for independent validation. A PLS-DA model alongside nine machine learning algorithms-namely support vector machine (SVM), tree, K-nearest neighbor (KNN), discriminant, ensemble, support vector machine kernel (SVMK), logistic regression kernel (LRK), naive Bayes (NB), and neural network (NN)-were constructed. Considering both accuracy and computational efficiency, the optimal model was selected and evaluated in terms of its accuracy, precision, recall, and F1-score. The optimal model was further validated using 140 newly collected samples to ensure its long-term stability after several months.
Results: Among the 10 classification models, the KNN model showed exceptional classification capability, with all evaluation metric exceeding 0.98. The KNN model was validated by the new 140 samples with a prediction accuracy of 85.7%, confirming its capability in distinguishing most FDMM.
Conclusion: The application of ATR-FTIR technology combined with the robust classification capabilities of ML models enabled rapid and accurate differentiation and identification of 55 FDMM, which contributed to ensuring their quality.
期刊介绍:
Phytochemical Analysis is devoted to the publication of original articles concerning the development, improvement, validation and/or extension of application of analytical methodology in the plant sciences. The spectrum of coverage is broad, encompassing methods and techniques relevant to the detection (including bio-screening), extraction, separation, purification, identification and quantification of compounds in plant biochemistry, plant cellular and molecular biology, plant biotechnology, the food sciences, agriculture and horticulture. The Journal publishes papers describing significant novelty in the analysis of whole plants (including algae), plant cells, tissues and organs, plant-derived extracts and plant products (including those which have been partially or completely refined for use in the food, agrochemical, pharmaceutical and related industries). All forms of physical, chemical, biochemical, spectroscopic, radiometric, electrometric, chromatographic, metabolomic and chemometric investigations of plant products (monomeric species as well as polymeric molecules such as nucleic acids, proteins, lipids and carbohydrates) are included within the remit of the Journal. Papers dealing with novel methods relating to areas such as data handling/ data mining in plant sciences will also be welcomed.