Feature extraction of fluorescence excitation-emission matrices using PCA fused with Wilks Λ-statistic and FDA for origin identification and active components content prediction of sweet basil
{"title":"Feature extraction of fluorescence excitation-emission matrices using PCA fused with Wilks Λ-statistic and FDA for origin identification and active components content prediction of sweet basil","authors":"Wenfei Du, Yong Yin, Hao Wu, Yunxia Yuan, Junliang Chen, Yunfeng Xu, Huichun Yu","doi":"10.1007/s11694-024-02935-7","DOIUrl":null,"url":null,"abstract":"<div><p>Sweet basil is a commonly used food spice and traditional medicine in China, geographical differences have a significant impact on the content of active ingredients of sweet basil. In this study, a feature extraction strategy of fluorescence data using principal component analysis (PCA) fused with Wilks Λ-statistic and fisher discriminant analysis (FDA) was proposed for rapid discrimination and quantitative detection of sweet basil from different origins. After the pretreatment of the fluorescence excitation-emission matrices, 8 feature emission wavelengths were extracted using PCA combined Wilks Λ-statistic, and subsequently fluorescence excitation-emission matrices corresponding to the feature emission wavelengths was fused by FDA, and the first three FD variables with a cumulative discriminant power of 99% were selected as feature vectors. Finally, the extreme learning machine (ELM) and random forest (RF) models were constructed for the sweet basil origin identification, and the back propagation neural network (BPNN) algorithm was employed for the rapid prediction of linalool and flavonoids in sweet basil. The results showed that compared with the RF model, the ELM model was more suitable for identifying sweet basil from different sources, with an accuracy rate of 98%. The coefficient of determination (R<sup>2</sup>) and root mean square error (RMSE) of the linalool content prediction model based on BPNN were 0.984 and 0.131, respectively. The R<sup>2</sup> and RMSE of the BPNN flavonoids content prediction model based on BPNN were 0.969 and 0.019, respectively. The above results indicated that the suggested feature extraction method showed good generalization ability and robustness, which provides an alternative feature selection method for the rapid identification of the food source and the evaluation of food quality.</p></div>","PeriodicalId":631,"journal":{"name":"Journal of Food Measurement and Characterization","volume":"18 12","pages":"9971 - 9982"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Food Measurement and Characterization","FirstCategoryId":"97","ListUrlMain":"https://link.springer.com/article/10.1007/s11694-024-02935-7","RegionNum":3,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Sweet basil is a commonly used food spice and traditional medicine in China, geographical differences have a significant impact on the content of active ingredients of sweet basil. In this study, a feature extraction strategy of fluorescence data using principal component analysis (PCA) fused with Wilks Λ-statistic and fisher discriminant analysis (FDA) was proposed for rapid discrimination and quantitative detection of sweet basil from different origins. After the pretreatment of the fluorescence excitation-emission matrices, 8 feature emission wavelengths were extracted using PCA combined Wilks Λ-statistic, and subsequently fluorescence excitation-emission matrices corresponding to the feature emission wavelengths was fused by FDA, and the first three FD variables with a cumulative discriminant power of 99% were selected as feature vectors. Finally, the extreme learning machine (ELM) and random forest (RF) models were constructed for the sweet basil origin identification, and the back propagation neural network (BPNN) algorithm was employed for the rapid prediction of linalool and flavonoids in sweet basil. The results showed that compared with the RF model, the ELM model was more suitable for identifying sweet basil from different sources, with an accuracy rate of 98%. The coefficient of determination (R2) and root mean square error (RMSE) of the linalool content prediction model based on BPNN were 0.984 and 0.131, respectively. The R2 and RMSE of the BPNN flavonoids content prediction model based on BPNN were 0.969 and 0.019, respectively. The above results indicated that the suggested feature extraction method showed good generalization ability and robustness, which provides an alternative feature selection method for the rapid identification of the food source and the evaluation of food quality.
期刊介绍:
This interdisciplinary journal publishes new measurement results, characteristic properties, differentiating patterns, measurement methods and procedures for such purposes as food process innovation, product development, quality control, and safety assurance.
The journal encompasses all topics related to food property measurement and characterization, including all types of measured properties of food and food materials, features and patterns, measurement principles and techniques, development and evaluation of technologies, novel uses and applications, and industrial implementation of systems and procedures.