{"title":"Comparative study of indirect and direct feature extraction algorithms in classifying tea varieties using near-infrared spectroscopy.","authors":"Xuefan Zhou, Xiaohong Wu, Bin Wu","doi":"10.1016/j.crfs.2025.101065","DOIUrl":null,"url":null,"abstract":"<p><p>Tea, a globally cherished beverage, has become an integral part of daily life, particularly in China. Given the extensive variety of teas, each distinguished by unique price points, flavors, and health benefits, effective classification within the tea industry is crucial to address the diverse preferences of consumers. This study utilized indirect and direct feature extraction algorithms to analyze the Near-Infrared (NIR) spectra of various tea varieties and compared their classification outcomes. Principal Component Analysis (PCA) was employed as a dimensionality reduction technique for indirect feature extraction algorithms. The study began with the collection of NIR spectra from different tea varieties, followed by the application of three spectral preprocessing algorithms. Indirect and direct feature extraction algorithms were then used to reduce the dimensionality of the preprocessed data. A K-Nearest Neighbors (KNN) classifier analyzed the dimensionality-reduced data to determine classification accuracy. The findings revealed that the classification accuracies of indirect feature extraction algorithms consistently exceeded those of direct feature extraction algorithms, with the former generally surpassing 90.0 %, while the latter remained lower. This indicates that indirect feature extraction algorithms are more adept at handling complex spectral data. A significant decline in classification accuracy was observed when data were processed with Savitzky-Golay (SG). An in-depth analysis led to the development of an optimization plan incorporating the Successive Projections Algorithm (SPA), which effectively enhanced all classification accuracies to above 90 %.</p>","PeriodicalId":10939,"journal":{"name":"Current Research in Food Science","volume":"10 ","pages":"101065"},"PeriodicalIF":7.0000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12099700/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current Research in Food Science","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1016/j.crfs.2025.101065","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"FOOD SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Tea, a globally cherished beverage, has become an integral part of daily life, particularly in China. Given the extensive variety of teas, each distinguished by unique price points, flavors, and health benefits, effective classification within the tea industry is crucial to address the diverse preferences of consumers. This study utilized indirect and direct feature extraction algorithms to analyze the Near-Infrared (NIR) spectra of various tea varieties and compared their classification outcomes. Principal Component Analysis (PCA) was employed as a dimensionality reduction technique for indirect feature extraction algorithms. The study began with the collection of NIR spectra from different tea varieties, followed by the application of three spectral preprocessing algorithms. Indirect and direct feature extraction algorithms were then used to reduce the dimensionality of the preprocessed data. A K-Nearest Neighbors (KNN) classifier analyzed the dimensionality-reduced data to determine classification accuracy. The findings revealed that the classification accuracies of indirect feature extraction algorithms consistently exceeded those of direct feature extraction algorithms, with the former generally surpassing 90.0 %, while the latter remained lower. This indicates that indirect feature extraction algorithms are more adept at handling complex spectral data. A significant decline in classification accuracy was observed when data were processed with Savitzky-Golay (SG). An in-depth analysis led to the development of an optimization plan incorporating the Successive Projections Algorithm (SPA), which effectively enhanced all classification accuracies to above 90 %.
期刊介绍:
Current Research in Food Science is an international peer-reviewed journal dedicated to advancing the breadth of knowledge in the field of food science. It serves as a platform for publishing original research articles and short communications that encompass a wide array of topics, including food chemistry, physics, microbiology, nutrition, nutraceuticals, process and package engineering, materials science, food sustainability, and food security. By covering these diverse areas, the journal aims to provide a comprehensive source of the latest scientific findings and technological advancements that are shaping the future of the food industry. The journal's scope is designed to address the multidisciplinary nature of food science, reflecting its commitment to promoting innovation and ensuring the safety and quality of the food supply.