ATR-FTIR Coupled With Machine Learning Provides a Fast Method for Identifying and Distinguishing 55 Varieties of Fruit-Derived Medicinal Materials.

IF 3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Wen-Jie Zhao, Ya-Ling An, Chun-Qian Song, Yu-Shi Huang, Li-Jie Zhang, Kang-Nan Liu, Zhen-Wei Li, Xiao-Kang Liu, Dai-di Zhang, De-An Guo
{"title":"ATR-FTIR Coupled With Machine Learning Provides a Fast Method for Identifying and Distinguishing 55 Varieties of Fruit-Derived Medicinal Materials.","authors":"Wen-Jie Zhao, Ya-Ling An, Chun-Qian Song, Yu-Shi Huang, Li-Jie Zhang, Kang-Nan Liu, Zhen-Wei Li, Xiao-Kang Liu, Dai-di Zhang, De-An Guo","doi":"10.1002/pca.3545","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Fruit-derived medicinal materials (FDMM) are extensively utilized in daily life, yet the market is beset by substantial variety confusion, which undermines consumer rights and well-being. Consequently, accurate identification of these materials is essential for guaranteeing their quality, effectiveness, and safety.</p><p><strong>Objectives: </strong>This study aimed to combine attenuated total reflection-Fourier transform infrared spectroscopy (ATR-FTIR) and machine learning (ML) techniques to differentiate and identify 55 kinds of FDMM.</p><p><strong>Materials and methods: </strong>A total of 861 sample batches were collected, with 721 allocated for model establishment and 140 for independent validation. A PLS-DA model alongside nine machine learning algorithms-namely support vector machine (SVM), tree, K-nearest neighbor (KNN), discriminant, ensemble, support vector machine kernel (SVMK), logistic regression kernel (LRK), naive Bayes (NB), and neural network (NN)-were constructed. Considering both accuracy and computational efficiency, the optimal model was selected and evaluated in terms of its accuracy, precision, recall, and F1-score. The optimal model was further validated using 140 newly collected samples to ensure its long-term stability after several months.</p><p><strong>Results: </strong>Among the 10 classification models, the KNN model showed exceptional classification capability, with all evaluation metric exceeding 0.98. The KNN model was validated by the new 140 samples with a prediction accuracy of 85.7%, confirming its capability in distinguishing most FDMM.</p><p><strong>Conclusion: </strong>The application of ATR-FTIR technology combined with the robust classification capabilities of ML models enabled rapid and accurate differentiation and identification of 55 FDMM, which contributed to ensuring their quality.</p>","PeriodicalId":20095,"journal":{"name":"Phytochemical Analysis","volume":" ","pages":""},"PeriodicalIF":3.0000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Phytochemical Analysis","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pca.3545","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Fruit-derived medicinal materials (FDMM) are extensively utilized in daily life, yet the market is beset by substantial variety confusion, which undermines consumer rights and well-being. Consequently, accurate identification of these materials is essential for guaranteeing their quality, effectiveness, and safety.

Objectives: This study aimed to combine attenuated total reflection-Fourier transform infrared spectroscopy (ATR-FTIR) and machine learning (ML) techniques to differentiate and identify 55 kinds of FDMM.

Materials and methods: A total of 861 sample batches were collected, with 721 allocated for model establishment and 140 for independent validation. A PLS-DA model alongside nine machine learning algorithms-namely support vector machine (SVM), tree, K-nearest neighbor (KNN), discriminant, ensemble, support vector machine kernel (SVMK), logistic regression kernel (LRK), naive Bayes (NB), and neural network (NN)-were constructed. Considering both accuracy and computational efficiency, the optimal model was selected and evaluated in terms of its accuracy, precision, recall, and F1-score. The optimal model was further validated using 140 newly collected samples to ensure its long-term stability after several months.

Results: Among the 10 classification models, the KNN model showed exceptional classification capability, with all evaluation metric exceeding 0.98. The KNN model was validated by the new 140 samples with a prediction accuracy of 85.7%, confirming its capability in distinguishing most FDMM.

Conclusion: The application of ATR-FTIR technology combined with the robust classification capabilities of ML models enabled rapid and accurate differentiation and identification of 55 FDMM, which contributed to ensuring their quality.

ATR-FTIR结合机器学习提供了一种快速识别和区分55种水果衍生药材的方法。
导语:果源性药材在日常生活中被广泛使用,但市场上存在大量品种混淆,损害了消费者的权益和福祉。因此,准确识别这些材料对于保证其质量、有效性和安全性至关重要。目的:本研究旨在结合衰减全反射-傅里叶变换红外光谱(ATR-FTIR)和机器学习(ML)技术对55种FDMM进行鉴别。材料与方法:共收集861批样品,其中721批用于模型建立,140批用于独立验证。构建了PLS-DA模型以及9种机器学习算法,即支持向量机(SVM)、树、k近邻(KNN)、判别、集成、支持向量机核(SVMK)、逻辑回归核(LRK)、朴素贝叶斯(NB)和神经网络(NN)。考虑准确率和计算效率,选择最优模型,并从准确率、精密度、召回率和f1得分等方面进行评价。利用140个新采集的样品进一步验证了最优模型,以确保其数月后的长期稳定性。结果:在10个分类模型中,KNN模型分类能力突出,评价指标均超过0.98。通过140个样本对KNN模型进行了验证,预测准确率达到85.7%,证实了KNN模型对大多数FDMM的识别能力。结论:ATR-FTIR技术的应用,结合ML模型强大的分类能力,能够快速准确地对55个FDMM进行鉴别和鉴定,有助于保证FDMM的质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Phytochemical Analysis
Phytochemical Analysis 生物-分析化学
CiteScore
6.00
自引率
6.10%
发文量
88
审稿时长
1.7 months
期刊介绍: Phytochemical Analysis is devoted to the publication of original articles concerning the development, improvement, validation and/or extension of application of analytical methodology in the plant sciences. The spectrum of coverage is broad, encompassing methods and techniques relevant to the detection (including bio-screening), extraction, separation, purification, identification and quantification of compounds in plant biochemistry, plant cellular and molecular biology, plant biotechnology, the food sciences, agriculture and horticulture. The Journal publishes papers describing significant novelty in the analysis of whole plants (including algae), plant cells, tissues and organs, plant-derived extracts and plant products (including those which have been partially or completely refined for use in the food, agrochemical, pharmaceutical and related industries). All forms of physical, chemical, biochemical, spectroscopic, radiometric, electrometric, chromatographic, metabolomic and chemometric investigations of plant products (monomeric species as well as polymeric molecules such as nucleic acids, proteins, lipids and carbohydrates) are included within the remit of the Journal. Papers dealing with novel methods relating to areas such as data handling/ data mining in plant sciences will also be welcomed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信