Katharina Beier, Thomas-Martin Dutschmann, Till Beuerle, Marcus Lubienski, Knut Baumann
{"title":"Classification of Horsetails Using Predictive Modelling on NIR Spectra","authors":"Katharina Beier, Thomas-Martin Dutschmann, Till Beuerle, Marcus Lubienski, Knut Baumann","doi":"10.1002/cem.3634","DOIUrl":null,"url":null,"abstract":"<p>Common horsetail (<i>Equisetum arvense L.</i>, syn.: field horsetail) holds a long tradition in the supportive treatment of numerous diseases. A frequently observed problem is the risk of confusing <i>Equisetum arvense</i> plants with another closely related species <i>Equisetum palustre</i> (syn.: marsh horsetail) due to its morphological similarities. The distinction between the two species during collection/harvest is further complicated by the fact that both species share similar habitats. This, however, is of particular importance because <i>E. palustre</i> contains toxic alkaloids (palustrine and palustridiene) while this is not the case for <i>E. arvense</i> used for medicinal purposes (Equiseti herba). The aim of this study was the classification of horsetails using near infrared spectroscopy (NIR). Therefore, over 370 <i>E. arvense</i> and <i>E. palustre</i> samples originating from all over Germany, consisting of 2 years of harvest, were analysed using two different devices from different manufacturers: (a) a miniature (portable) NIR device and (b) a benchtop NIR device. Initial unsupervised machine learning techniques (PCA and t-SNE) provided insightful visualizations for the distribution of both species within the data space. After applying variable screening to the spectral data, a variety of supervised machine learning models based on different algorithms were trained to predict the species from an individual spectrum. In a repeated cross-validation (CV) approach, it could be shown that the spectra from both spectrometers are sufficient to achieve classification accuracies around 90%. Additionally, the data allowed for discriminating between harvesting seasons as well. The success of the complete workflow is further emphasized by assessing its reliability through posterior probabilities, which were high for the predicted class labels, implying a satisfying model certainty.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3634","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3634","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0
Abstract
Common horsetail (Equisetum arvense L., syn.: field horsetail) holds a long tradition in the supportive treatment of numerous diseases. A frequently observed problem is the risk of confusing Equisetum arvense plants with another closely related species Equisetum palustre (syn.: marsh horsetail) due to its morphological similarities. The distinction between the two species during collection/harvest is further complicated by the fact that both species share similar habitats. This, however, is of particular importance because E. palustre contains toxic alkaloids (palustrine and palustridiene) while this is not the case for E. arvense used for medicinal purposes (Equiseti herba). The aim of this study was the classification of horsetails using near infrared spectroscopy (NIR). Therefore, over 370 E. arvense and E. palustre samples originating from all over Germany, consisting of 2 years of harvest, were analysed using two different devices from different manufacturers: (a) a miniature (portable) NIR device and (b) a benchtop NIR device. Initial unsupervised machine learning techniques (PCA and t-SNE) provided insightful visualizations for the distribution of both species within the data space. After applying variable screening to the spectral data, a variety of supervised machine learning models based on different algorithms were trained to predict the species from an individual spectrum. In a repeated cross-validation (CV) approach, it could be shown that the spectra from both spectrometers are sufficient to achieve classification accuracies around 90%. Additionally, the data allowed for discriminating between harvesting seasons as well. The success of the complete workflow is further emphasized by assessing its reliability through posterior probabilities, which were high for the predicted class labels, implying a satisfying model certainty.
期刊介绍:
The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.