Jiacong Ping , Nan Hao , Xuting Guo , Peiqi Miao , Zhiqi Guan , Haiyang Chen , Changqing Liu , Gang Bai , Wenlong Li
{"title":"Rapid and accurate identification of Panax ginseng origins based on data fusion of near-infrared and laser-induced breakdown spectroscopy","authors":"Jiacong Ping , Nan Hao , Xuting Guo , Peiqi Miao , Zhiqi Guan , Haiyang Chen , Changqing Liu , Gang Bai , Wenlong Li","doi":"10.1016/j.foodres.2025.115925","DOIUrl":null,"url":null,"abstract":"<div><div>This study aims to leverage laser-induced breakdown spectroscopy (LIBS) and near-infrared spectroscopy (NIR), combined with advanced data processing and fusion methods, to accurately trace the origin of <em>Panax ginseng</em>. Initially, the isolation forest algorithm was applied to remove outliers, ensuring the quality of the dataset. Subsequently, classification models using random forest (RF), support vector machine (SVM), and stochastic gradient descent (SGD) classifier were developed based on the LIBS and NIR spectral data. The performance of these models was optimized through various preprocessing techniques and variable selection methods. The results indicated that the standard normal variate (SNV) combined with sequential forward selection (SFS) and the SVM model performed best with LIBS data, while the second derivative (2nd Der) combined with multiple scattering correction (MSC), least absolute shrinkage and selection operator (LASSO), and the RF model was most effective for NIR data. In terms of data fusion, this study compared different fusion models and found that the ensemble learning-based fusion model outperformed the outer product fusion model, which in turn exceeded the performance of the mid-level data fusion model. Ultimately, the ensemble learning-based fusion model achieved a prediction accuracy of 99.0% on the independent prediction set, with a Kappa value of 0.982, an F1 score of 0.990, and a Brier score of 0.009. Furthermore, an analysis of elemental importance revealed that Fe, Mg, Na, and Ca were the most significant elements for distinguishing <em>Panax ginseng</em> from different origins, with O, Cu, Al, K, Mn, Ba, and Cl also being important. In conclusion, this study proposes an effective data fusion method combining LIBS and NIR, which not only achieves high traceability accuracy but also provides a theoretical foundation and technical support for quality control and traceability in food and agricultural products.</div></div>","PeriodicalId":323,"journal":{"name":"Food Research International","volume":"204 ","pages":"Article 115925"},"PeriodicalIF":7.0000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Food Research International","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0963996925002625","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
This study aims to leverage laser-induced breakdown spectroscopy (LIBS) and near-infrared spectroscopy (NIR), combined with advanced data processing and fusion methods, to accurately trace the origin of Panax ginseng. Initially, the isolation forest algorithm was applied to remove outliers, ensuring the quality of the dataset. Subsequently, classification models using random forest (RF), support vector machine (SVM), and stochastic gradient descent (SGD) classifier were developed based on the LIBS and NIR spectral data. The performance of these models was optimized through various preprocessing techniques and variable selection methods. The results indicated that the standard normal variate (SNV) combined with sequential forward selection (SFS) and the SVM model performed best with LIBS data, while the second derivative (2nd Der) combined with multiple scattering correction (MSC), least absolute shrinkage and selection operator (LASSO), and the RF model was most effective for NIR data. In terms of data fusion, this study compared different fusion models and found that the ensemble learning-based fusion model outperformed the outer product fusion model, which in turn exceeded the performance of the mid-level data fusion model. Ultimately, the ensemble learning-based fusion model achieved a prediction accuracy of 99.0% on the independent prediction set, with a Kappa value of 0.982, an F1 score of 0.990, and a Brier score of 0.009. Furthermore, an analysis of elemental importance revealed that Fe, Mg, Na, and Ca were the most significant elements for distinguishing Panax ginseng from different origins, with O, Cu, Al, K, Mn, Ba, and Cl also being important. In conclusion, this study proposes an effective data fusion method combining LIBS and NIR, which not only achieves high traceability accuracy but also provides a theoretical foundation and technical support for quality control and traceability in food and agricultural products.
期刊介绍:
Food Research International serves as a rapid dissemination platform for significant and impactful research in food science, technology, engineering, and nutrition. The journal focuses on publishing novel, high-quality, and high-impact review papers, original research papers, and letters to the editors across various disciplines in the science and technology of food. Additionally, it follows a policy of publishing special issues on topical and emergent subjects in food research or related areas. Selected, peer-reviewed papers from scientific meetings, workshops, and conferences on the science, technology, and engineering of foods are also featured in special issues.