Journal of Chemometrics最新文献_第8页

Stacking Ensemble Learning Method for Quantitative Analysis of Soluble Solid Content in Apples 苹果可溶性固形物含量定量分析的堆叠集成学习方法

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-01-13 DOI: 10.1002/cem.3635

Lixin Zhang, Zhensheng Huang, Xiao Zhang

{"title":"Stacking Ensemble Learning Method for Quantitative Analysis of Soluble Solid Content in Apples","authors":"Lixin Zhang, Zhensheng Huang, Xiao Zhang","doi":"10.1002/cem.3635","DOIUrl":"10.1002/cem.3635","url":null,"abstract":"<div>\u0000 \u0000 The soluble solids content (SSC) in apples directly affects their quality. This study aimed to detect SSC nondestructively using hyperspectral technology combined with chemometrics. However, data generation may not follow a specific pattern, and even small perturbations in the data can have a significant impact on the constructed model. To improve the anti-interference capability of individual models, this study proposed a stacking ensemble learning method that adopted partial least squares (PLS), support vector machine (SVM), extreme gradient boosting (Xgboost), random forest (RF) as basic-learners, and RF serving as a meta-learner. Experimental results showed that the performance of the established model on the test set were as follows: the root mean square error (RMSE) was 0.4325, mean absolute error (MAE) was 0.3245, mean absolute percentage error (MAPE) was 0.0271, coefficient of determination (<math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mrow>\u0000 <mi>R</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mn>2</mn>\u0000 </mrow>\u0000 </msup>\u0000 </mrow>\u0000 <annotation>$$ {R}^2 $$</annotation>\u0000 </semantics></math>) was 0.9250. These results indicate that the stacking ensemble learning approach could appropriately fuse the predictive results of each basic-learner and improve the prediction accuracy of individual models. To verify the superiority of the proposed stacking ensemble learning method, the selection of its basic-learners, meta-learner, and combination strategy were compared and analyzed. This study not only provides a theoretical reference for the further development of related nondestructive detection equipment but also offers guidance for fusion algorithms as well.\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143114701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust Multivariate Dispersion Charts for Quality Control: Application to Sulfur Dioxide Monitoring 质量控制的鲁棒多元色散图：在二氧化硫监测中的应用

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-01-10 DOI: 10.1002/cem.3642

Jimoh Olawale Ajadi, Nasir Abbas, Muhammad Riaz, Nurudeen Ayobami Ajadi, Taofeek Adeola Salami, Nurudeen A. Adegoke

{"title":"Robust Multivariate Dispersion Charts for Quality Control: Application to Sulfur Dioxide Monitoring","authors":"Jimoh Olawale Ajadi, Nasir Abbas, Muhammad Riaz, Nurudeen Ayobami Ajadi, Taofeek Adeola Salami, Nurudeen A. Adegoke","doi":"10.1002/cem.3642","DOIUrl":"10.1002/cem.3642","url":null,"abstract":"<div>\u0000 \u0000 This study introduces two robust multivariate Shewhart-type control charts based on grouped observations to detect changes in the covariance matrix, with a focus on monitoring sulfur dioxide levels during quality control processes. We compute the covariance matrix of observations, and apply the least absolute shrinkage and selection operator to penalize it in the in-control process. Logarithms are then applied to eigenvalues derived through singular value decomposition (SVD) of the shrunken covariance matrix, ensuring robustness to non-normality in the multivariate data. The proposed methods offer significant advantages, particularly in their ability to maintain robustness to non-normality without relying on strict distributional assumptions. Performance comparisons using the average run length demonstrate that the proposed charts exhibit superior robustness to normality assumptions compared with existing methods. However, potential limitations include the computational complexity of the shrinkage and SVD processes, which may affect the scalability of large datasets. An application to the white wine production process illustrates the effectiveness of the proposed methods for analyzing complex multivariate chemical data. These findings indicate that the introduced charts enhance the detection of shifts in the covariance matrix of physicochemical properties, thereby improving the reliability of quality control processes in non-normal environments. This study provides valuable tools for quality engineers and practitioners in industries dealing with multivariate analytical data, contributing to improved process monitoring and control, ensuring higher quality standards, and ensuring consistent product outcomes in fields such as food science and industrial chemistry.\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143113942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Artificial and Algorithmic Screening of Infrared Spectral Feature Bands of Gastrodia elata to Achieve Rapid Identification of Its Species 天麻红外光谱特征波段的人工筛选与算法筛选实现天麻品种的快速鉴定

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-01-08 DOI: 10.1002/cem.3641

Shuai Liu, Honggao Liu, Jieqing Li, Yuanzhong Wang

{"title":"Artificial and Algorithmic Screening of Infrared Spectral Feature Bands of Gastrodia elata to Achieve Rapid Identification of Its Species","authors":"Shuai Liu, Honggao Liu, Jieqing Li, Yuanzhong Wang","doi":"10.1002/cem.3641","DOIUrl":"10.1002/cem.3641","url":null,"abstract":"<div>\u0000 \u0000 Gastrodia elata is a traditional Chinese medicine with medicinal and edible values. In this paper, two kinds of datasets were acquired: partial spectra (artificially obtained peak segment spectra) and full spectra (4000–400 cm−1). Competitive adaptive reweighted sampling algorithm (CARS) and successive projection algorithm (SPA) were utilized to extract the characteristic variables of the two datasets, and Partial Least Squares Discriminant Analysis (PLS-DA) models, Support Vector Machines (SVM) models, Random Forests (RF) models, and Residual convolutional neural networks (ResNet) were established. It was found that among the PLS-DA models whole-MSC-CARS-PLS-DA was optimal, with a Root Mean Square Error of Prediction (RMSEP) of 0.0658; among the SVM models Partial-Standard Normal Variable (SNV-SPA-SVM was the best, with a kernel parameter of 0.1768 and the lowest number of support vectors; among the RF models Partial-SNV-RF is optimal, but not as effective as the first two models. The loss value of the ResNet model built based on effective information is 0.001, and the model building time is short and directly uses the original data. Therefore, the ResNet model based on feature bands is the most suitable for practical application compared with other models.\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143113050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Principal Component Analysis: Standardisation 主成分分析：标准化

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-01-02 DOI: 10.1002/cem.3607

Richard G. Brereton

{"title":"Principal Component Analysis: Standardisation","authors":"Richard G. Brereton","doi":"10.1002/cem.3607","DOIUrl":"10.1002/cem.3607","url":null,"abstract":"Standardisation of the columns of a matrix is a common transformation prior to PCA. It can be called by different names, including autoscaling and normalisation. The latter term is confusing terminology, as it is also used for a number of other transformations, so we advise against calling this normalisation.As standardisation is about scaling and not statistical estimation, it is best to use the definition of the population standard deviation <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msub>\u0000 <mi>s</mi>\u0000 <mi>j</mi>\u0000 </msub>\u0000 <mo>=</mo>\u0000 <msqrt>\u0000 <mrow>\u0000 <munderover>\u0000 <mo>∑</mo>\u0000 <mrow>\u0000 <mi>i</mi>\u0000 <mo>=</mo>\u0000 <mn>1</mn>\u0000 </mrow>\u0000 <mi>I</mi>\u0000 </munderover>\u0000 <msup>\u0000 <mfenced>\u0000 <mrow>\u0000 <msub>\u0000 <mi>x</mi>\u0000 <mi>ij</mi>\u0000 </msub>\u0000 <mo>−</mo>\u0000 <msub>\u0000 <mover>\u0000 <mi>x</mi>\u0000 <mo>¯</mo>\u0000 </mover>\u0000 <mi>j</mi>\u0000 </msub>\u0000 </mrow>\u0000 </mfenced>\u0000 <mn>2</mn>\u0000 </msup>\u0000 <mo>/</mo>\u0000 <mi>I</mi>\u0000 </mrow>\u0000 </msqrt>\u0000 </mrow>\u0000 <annotation>$$ {s}_j=sqrt{sum limits_{i=1}^I{left({x}_{mathrm{ij}}-{overline{x}}_jright)}^2/I} $$</annotation>\u0000 </semantics></math> rather than the sample standard deviation.We can now standardise each matrix. To save room, we just calculate one numerical value so that readers that are interested can check they can reproduce the results from this article. The standardised value for Dataset 1 x83 = 0.566 (Sample H, variable x3).Hence, whether standardisation prior to PCA is a useful technique depends on the nature of the data and the problem in hand. In some cases, it can degrade patterns, whereas in other situations it can pull out important information.Although standardisation can make a big difference to the appearance of PC plots, in other cases, it makes little or no d","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3607","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143110775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Monitoring a Coffee Roasting Process Based on Near-Infrared and Raman Spectroscopy Coupled With Chemometrics 基于近红外和拉曼光谱结合化学计量学的咖啡烘焙过程监测

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2024-12-12 DOI: 10.1002/cem.3638

Leah Munyendo, Katharina Schuster, Wolfgang Armbruster, Majharulislam Babor, Daniel Njoroge, Yanyan Zhang, Almut von Wrochem, Alexander Schaum, Bernd Hitzmann

{"title":"Monitoring a Coffee Roasting Process Based on Near-Infrared and Raman Spectroscopy Coupled With Chemometrics","authors":"Leah Munyendo, Katharina Schuster, Wolfgang Armbruster, Majharulislam Babor, Daniel Njoroge, Yanyan Zhang, Almut von Wrochem, Alexander Schaum, Bernd Hitzmann","doi":"10.1002/cem.3638","DOIUrl":"10.1002/cem.3638","url":null,"abstract":"Roasting is a fundamental step in coffee processing, where complex reactions form chemical compounds related to the coffee flavor and its health-beneficial effects. These reactions occur on various time scales depending on the roasting conditions. To monitor the process and ensure reproducibility, the study proposes simple and fast techniques based on spectroscopy. This work uses analytical tools based on near-infrared (NIR) and Raman spectroscopy to monitor the coffee roasting process by predicting chemical changes in coffee beans during roasting. Green coffee beans of Robusta and Arabica species were roasted at 240°C for different roasting times. The spectra of the samples were taken using the spectrometers and modeled by the k-nearest neighbor regression (KNR), partial least squares regression (PLSR), and multiple linear regression (MLR) to predict concentrations from the spectral data sets. For NIR spectra, all the models provided satisfactory results for the prediction of chlorogenic acid, trigonelline, and DPPH radical scavenging activity with low relative root mean square error of prediction (pRMSEP < 9.649%) and high coefficient of determination (R2 > 0.915). The predictions for ABTS radical scavenging activity were reasonably good. On the contrary, the models poorly predicted the caffeine and total phenolic content (TPC). Similarly, all the models based on the Raman spectra provided good prediction accuracies for monitoring the dynamics of chlorogenic acid, trigonelline, and DPPH radical scavenging activity (pRMSEP < 7.849% and R2 > 0.944). The results for ABTS radical scavenging activity, caffeine, and TPC were similar to those of NIR spectra. These findings demonstrate the potential of Raman and NIR spectroscopy methods in tracking chemical changes in coffee during roasting. By doing so, it may be possible to control the quality of coffee in terms of its aroma, flavor, and roast level.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3638","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143114505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Classification of Horsetails Using Predictive Modelling on NIR Spectra 基于近红外光谱预测模型的马尾植物分类

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2024-12-11 DOI: 10.1002/cem.3634

Katharina Beier, Thomas-Martin Dutschmann, Till Beuerle, Marcus Lubienski, Knut Baumann

{"title":"Classification of Horsetails Using Predictive Modelling on NIR Spectra","authors":"Katharina Beier, Thomas-Martin Dutschmann, Till Beuerle, Marcus Lubienski, Knut Baumann","doi":"10.1002/cem.3634","DOIUrl":"10.1002/cem.3634","url":null,"abstract":"Common horsetail (Equisetum arvense L., syn.: field horsetail) holds a long tradition in the supportive treatment of numerous diseases. A frequently observed problem is the risk of confusing Equisetum arvense plants with another closely related species Equisetum palustre (syn.: marsh horsetail) due to its morphological similarities. The distinction between the two species during collection/harvest is further complicated by the fact that both species share similar habitats. This, however, is of particular importance because E. palustre contains toxic alkaloids (palustrine and palustridiene) while this is not the case for E. arvense used for medicinal purposes (Equiseti herba). The aim of this study was the classification of horsetails using near infrared spectroscopy (NIR). Therefore, over 370 E. arvense and E. palustre samples originating from all over Germany, consisting of 2 years of harvest, were analysed using two different devices from different manufacturers: (a) a miniature (portable) NIR device and (b) a benchtop NIR device. Initial unsupervised machine learning techniques (PCA and t-SNE) provided insightful visualizations for the distribution of both species within the data space. After applying variable screening to the spectral data, a variety of supervised machine learning models based on different algorithms were trained to predict the species from an individual spectrum. In a repeated cross-validation (CV) approach, it could be shown that the spectra from both spectrometers are sufficient to achieve classification accuracies around 90%. Additionally, the data allowed for discriminating between harvesting seasons as well. The success of the complete workflow is further emphasized by assessing its reliability through posterior probabilities, which were high for the predicted class labels, implying a satisfying model certainty.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3634","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143113981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessment of Conformal Prediction and Standard Normal Distribution for Autonomous Consensus One-Class Classification 自治一致一类分类的适形预测和标准正态分布的评估

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2024-12-10 DOI: 10.1002/cem.3639

Hyrum J. Redd, John H. Kalivas

{"title":"Assessment of Conformal Prediction and Standard Normal Distribution for Autonomous Consensus One-Class Classification","authors":"Hyrum J. Redd, John H. Kalivas","doi":"10.1002/cem.3639","DOIUrl":"10.1002/cem.3639","url":null,"abstract":"<div>\u0000 \u0000 Determining if target samples are members of a particular source class of samples has a large variety of applications within many disciplines. In particular, one-class classification (OCC) is essential in many areas, such as food contamination or product authentication. There are numerous widely accepted methods for OCC, but these OCC methods involve optimizing tuning parameters such as the number of principal components (PCs). This study presents the development and application of a rigorous autonomous OCC process based on a hybrid fusion consensus technique, termed consensus OCC (Con OCC). The Con OCC method uses the new physicochemical responsive integrated similarity measure (PRISM) composed of multiple similarity measures all independent of optimization. Similarity values are fused to a single value describing the degree of sample similarity to a collection of samples. Two approaches are developed to translate each sample-wise PRISM value to a probability of class membership: conformal prediction p-values and z-scores. These two methods are evaluated as separate Con OCC processes using seven datasets measured across a variety of instruments. In both cases, class membership labels are not used to set decision thresholds, and classifiers are not optimized relative to respective tuning parameters. Results indicate that z-scoring often produces better results, but conformal prediction provides greater consistency across datasets. That is, z-score values tend to range across datasets while conformal prediction p-values do not.\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143113655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Soft Sensor Modeling of Acrylic Acid Yield Based on Autoencoder Long Short-Term Memory Neural Network of Savitzky–Golay and ReliefF Algorithm 基于Savitzky-Golay自编码器长短期记忆神经网络和ReliefF算法的丙烯酸产率软测量建模

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2024-12-10 DOI: 10.1002/cem.3640

Shuting Liu, Wenbo Zhang, Hangfeng He, Shumei Zhang

引用次数: 0

Automation of Local Regression Model Building for Spectroscopic Data 光谱数据局部回归模型建立的自动化

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2024-12-09 DOI: 10.1002/cem.3637

Randy J. Pell, L. Scott Ramos, Brian Rohrback

引用次数: 0

Normalization Strategies for Lipidome Data in Cell Line Panels 细胞系面板中脂质组数据的规范化策略

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2024-12-08 DOI: 10.1002/cem.3636

Hanneke Leegwater, Zhengzheng Zhang, Xiaobing Zhang, Thomas Hankemeier, Amy C. Harms, Annelien J. M. Zweemer, Sylvia E. Le Dévédec, Alida Kindt

{"title":"Normalization Strategies for Lipidome Data in Cell Line Panels","authors":"Hanneke Leegwater, Zhengzheng Zhang, Xiaobing Zhang, Thomas Hankemeier, Amy C. Harms, Annelien J. M. Zweemer, Sylvia E. Le Dévédec, Alida Kindt","doi":"10.1002/cem.3636","DOIUrl":"10.1002/cem.3636","url":null,"abstract":"Sample collection can significantly affect lipid concentration measurements in cell line panels, concealing intrinsic differences between cancer subtypes. Most quality control steps in lipidomic data analysis focus on controlling technical variation. Correcting for the total amount of biological material remains an additional challenge for cell line panels. Here, we investigated how we can normalize lipidomic data acquired from multiple cell lines to correct for differences in sample biomass. We studied how commonly used data normalization and transformation strategies influence the resulting lipid data distributions. We compared normalization by biological properties including cell count and total protein concentration, to statistical and data-based approaches, such as median, mean, or probabilistic quotient-based normalization. We used intraclass correlations to estimate how normalization influenced the similarity between replicates. Normalizing lipidomic data by cell count improved the similarity between replicates but only for cell lines with similar morphologies. When comparing cell line panels with diverse morphologies neither cell count nor protein concentration was sufficient to increase the similarity of lipid abundances between cell line replicates. Data-based normalizations increased these similarities but resulted in a bias towards the large and variable lipid class of triglycerides. These artifacts are reduced by normalizing for the abundance of only structural lipids. We conclude that there is a delicate balance between improving the similarity between replicates and avoiding artifacts in lipidomic data and emphasize the importance of an appropriate normalization strategy in studying biological phenomena using lipidomics.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3636","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143112892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0