Paul-Albert Schneide, Michael Sorochan Armstrong, Neal Gallagher, Rasmus Bro
{"title":"Unlocking New Capabilities in the Analysis of GC × GC-TOFMS Data With Shift-Invariant Multi-Linearity","authors":"Paul-Albert Schneide, Michael Sorochan Armstrong, Neal Gallagher, Rasmus Bro","doi":"10.1002/cem.3623","DOIUrl":"10.1002/cem.3623","url":null,"abstract":"<div>\u0000 \u0000 <p>This paper introduces a novel deconvolution algorithm, shift-invariant multi-linearity (SIML), which significantly enhances the analysis of data from two-dimensional gas chromatography instruments coupled to a time-of-flight mass spectrometer (GC × GC-TOFMS). Designed to address the challenges posed by retention time shifts and high noise levels, SIML incorporates wavelet-based smoothing and Fourier-transform based shift-correction within the multivariate curve resolution-alternating least squares (MCR-ALS) framework. We benchmarked the SIML algorithm against non-negativity constrained MCR-ALS and parallel factor analysis 2 with flexible coupling (PARAFAC2 × N) using both simulated and real GC × GC-TOFMS datasets. Our results demonstrate that SIML provides unique solutions with significantly improved robustness, particularly in low signal-to-noise ratio scenarios, where it maintains high accuracy in estimating mass spectra and concentrations. The enhanced reliability of quantitative analyses afforded by SIML underscores its potential for broad application in complex matrix analyses across environmental science, food science, and biological research.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143112315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Variable Selection for Near-Infrared Spectroscopy Based on Simulated Annealing Bee Colony Algorithm","authors":"Jianfei Shi, Baihong Tong, Jinming Liu, Zhengguang Chen, Pengfei Li, Chong Tan","doi":"10.1002/cem.3633","DOIUrl":"10.1002/cem.3633","url":null,"abstract":"<div>\u0000 \u0000 <p>Variable selection is an effective method to enhance the modeling performance of near-infrared spectroscopy. Given the promising application prospects of intelligent optimization algorithms in spectral feature variable selection, this article combines the artificial bee colony algorithm with the simulated annealing algorithm to construct a simulated annealing bee colony algorithm (SABC). To explore the feasibility of SABC for spectral variable selection, SABC was applied to construct a partial least squares spectral quantitative detection model for corn stover cellulose and soil organic matter contents. The modeling performance was compared with that of the full spectrum, genetic algorithm, simulated annealing algorithm, and artificial bee colony algorithm; it was found that the model regression precision established by SABC was the best. For the cellulose and organic matter content detection models, the coefficients of determination of the validation set were 0.9433 and 0.9853, with the relative root mean squared error of 1.7901% and 0.8011%, and the residual prediction deviation of 4.1741 and 8.3931, respectively, which could meet the corresponding actual detection needs. SABC adopted the strategy of multiple runs to select the repeated wavelength variables, effectively reduced variable dimensions and model complexity, improved the prediction performance of the regression model, and provided a new approach for building a high-performance near-infrared spectroscopy (NIRS) quantitative calibration model.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143119265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Principal Components Analysis: Centring and Rotations","authors":"Richard G. Brereton","doi":"10.1002/cem.3610","DOIUrl":"10.1002/cem.3610","url":null,"abstract":"<p>It is rare to perform PCA on raw data, and usually some transformation or pre-processing is performed prior to PCA.</p><p>We will illustrate the principles of centring using four 6 × 2 datasets numbered 1 to 4 as presented in Table 1, consisting of objects A to F, and will use primarily graphical approaches. As many readers will choose to centre data prior to PCA, having a good understanding is useful. Readers should be able to reproduce corresponding numerical results using your favourite package but some of the graphics might not be available in most software.</p><p>So far, this article has primarily been about geometry, but what are the practical consequences? The first and most obvious is that centring may usually change the patterns of the scores. When there are more than two components, rotations can be very complicated algebraically, but we will not have the room in this introductory article to discuss this in detail, but a 3D rotation, for example, is often expressed as three separate 2D rotations around each of the axes [<span>5, 6</span>]. Changes in signs can be viewed as reflections in hyperplanes. The directions of PC axes can vary considerably according to whether the data are centred and often show much greater deviation than in these simple two dimensional examples.</p><p>As such, centring is not straightforward to visualise and understand when the number of variables is large. Whether it is appropriate to centre does depend on the questions being asked. For example, in spectroscopy of mixtures, we may be primarily interested in the properties of individual chemical components above a baseline, so centring might not be appropriate. In many areas, we are interested in variability around a mean and so it is best to centre the data. This article discusses the influence of column centring on PC scores.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3610","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Max Feinberg, Stephan Clémençon, Serge Rudaz, Julien Boccard
{"title":"Kernel-Based Bootstrap Synthetic Data to Estimate Measurement Uncertainty in Analytical Sciences","authors":"Max Feinberg, Stephan Clémençon, Serge Rudaz, Julien Boccard","doi":"10.1002/cem.3628","DOIUrl":"10.1002/cem.3628","url":null,"abstract":"<p>Measurement uncertainty (MU) is becoming a key figure of merit for analytical methods, and estimating MU from method validation data is cost-effective and practical. Since MU can be defined as a coverage interval of a given result, the computation of statistical prediction intervals is a possible approach, but the quality of the intervals is questionable when the number of available data is reduced. In this context, the bootstrap procedure constitutes an efficient strategy to increase the observed data variability. While applying naive bootstrap to validation data raises some computational challenges, the use of smooth bootstrap is much more interesting when synthetic data are generated using an adapted kernel density estimation algorithm. MU can be directly obtained in a very convenient way as an uncertainty function applicable to any unknown future measurement. This publication presents the advantages and disadvantages of this new method illustrated using diverse in-house and interlaboratory validation data.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3628","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rapid Detection of Stabilizer Content in Double-Base Propellant Based on Artificial Neural Network Combined With Near-Infrared Spectroscopy","authors":"Dihua Ouyang, Tianyu Cui, Qiantao Zhang, Haoxiang Dai, Xiaowen Qin, Yaoli Hu","doi":"10.1002/cem.3632","DOIUrl":"10.1002/cem.3632","url":null,"abstract":"<div>\u0000 \u0000 <p>During long-term storage, double-base propellants are prone to chemical decomposition of internal nitrate esters, leading to decreased burn rate, reduced strength, and degraded ballistic performance. Adding an appropriate amount of Centralite-II is crucial for ensuring storage safety. This study proposes a novel method combining near-infrared spectroscopy (NIRS) with artificial intelligence to rapidly and non-destructively detect the content of Centralite-II in double-base propellants. The optimal modeling wavelength ranges of 4000–4600 cm<sup>−1</sup> and 5700–6100 cm<sup>−1</sup> were identified, and the raw spectral data were preprocessed using standard normal variate (SNV) transformation to improve the signal-to-noise ratio. Principal component analysis (PCA) was then applied to reduce data dimensionality, and the first three principal components were used as inputs for a backpropagation (BP-ANN) neural network. The resulting PCA-BP-ANN model showed excellent performance on the training set, with an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msubsup>\u0000 <mi>R</mi>\u0000 <mi>c</mi>\u0000 <mn>2</mn>\u0000 </msubsup>\u0000 </mrow>\u0000 <annotation>$$ {R}_c&#x0005E;2 $$</annotation>\u0000 </semantics></math> of 0.9830 and an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mtext>RMSEC</mtext>\u0000 </mrow>\u0000 <annotation>$$ RMSEC $$</annotation>\u0000 </semantics></math> of 0.0376%. During independent validation, the model demonstrated strong generalization ability, achieving an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msubsup>\u0000 <mi>R</mi>\u0000 <mi>p</mi>\u0000 <mn>2</mn>\u0000 </msubsup>\u0000 </mrow>\u0000 <annotation>$$ {R}_p&#x0005E;2 $$</annotation>\u0000 </semantics></math> of 0.9824 and an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mtext>RMSEP</mtext>\u0000 </mrow>\u0000 <annotation>$$ RMSEP $$</annotation>\u0000 </semantics></math> of 0.3179%, comparative analysis with other models, including BP, PLS, ELM, SVR, and LSTM, indicated that the PCA-BP-ANN model exhibited superior prediction accuracy and generalization capability. This method provides a rapid and non-destructive approach for assessing the stabilizer content in double-base propellants and expands the application of NIRS and AI techniques in the field of energetic materials.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Process Fault Detection and Diagnosis Method Based on Factor Analysis: Application on the Three-Tank System Process","authors":"Cheng Zhang, Ze-hao Xu, Yu-yu Lao, Yuan Li","doi":"10.1002/cem.3627","DOIUrl":"10.1002/cem.3627","url":null,"abstract":"<div>\u0000 \u0000 <p>To address the issue of underreporting faults in the detection of tiny faults by dynamic factor analysis (DFA), a novel fault detection and diagnosis method based on DFA-sliding window combined with mean square error (DFA-SWMSE) is proposed. Firstly, the data matrix is augmented by introducing time lag shifts. Secondly, factor analysis (FA) is applied to the augmented data matrix, achieving dimensionality reduction and feature extraction while retaining most of the original data's information. Then, the sliding window technique is applied to calculate the mean square error of the dimensionally reduced data, allowing for the monitoring of the system's current state and the detection of tiny faults. Finally, effective fault diagnosis is achieved through the analysis of fault factors and variable contributions. The proposed method is validated using a complex dynamic numerical example and a three-tank system process named Sim3Tanks. This system has gained widespread application in the field of process fault detection due to its ability to simulate and generate various types of faults. The proposed method is compared with principal component analysis (PCA), dynamic principal component analysis (DPCA), PCA similarity factor (SPCA), FA, and DFA. The experimental results thoroughly validate the effectiveness of the proposed method in detecting and diagnosing tiny faults in dynamic processes.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tuomas Korpinsalo, Juhana Rautavirta, Sami Huhtala, Tapani Reinikainen, Jukka Corander
{"title":"Forensic Comparison of Amphetamine Chemical Profiles by Bayesian Predictive Modelling","authors":"Tuomas Korpinsalo, Juhana Rautavirta, Sami Huhtala, Tapani Reinikainen, Jukka Corander","doi":"10.1002/cem.3630","DOIUrl":"10.1002/cem.3630","url":null,"abstract":"<div>\u0000 \u0000 <p>Forensic chemists frequently employ statistical profiling approaches to assess the degree of similarity between samples of illicit drugs. Such profiling information can help reveal connections between nodes in distribution networks and manufacturing laboratories. For amphetamine, the routine method of comparing a pair of samples includes the use of a dissimilarity measure based on the Pearson correlation coefficient calculated between their chemical profiles obtained through gas chromatography–mass spectrometry. This simple measure of (dis)similarity has been shown distinguish pairs sharing a common origin (e.g., same production batch) to a reasonable level of accuracy. However, Pearson correlation fails to capture all the relevant notions of similarity between chemical profiles of amphetamine. We present a new statistical method for forensic drug comparison that uses a more sophisticated statistical modelling approach to determine similarity between samples. We show that this leads to improved performance over the correlation-based approach. The proposed method is easily extendable and has an intuitive interpretation both from chemistry and forensic perspectives, which supports wide applicability to illicit drug profiling in practice.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chemometrics: A Vital Implement for Understanding the Water Structures by Near-Infrared Spectroscopy","authors":"Haipeng Wang, Li Han, Wensheng Cai, Xueguang Shao","doi":"10.1002/cem.3631","DOIUrl":"10.1002/cem.3631","url":null,"abstract":"<div>\u0000 \u0000 <p>Water structures take an important role in chemical and biological systems, because the structure and function of a molecule may depend on the structure of water with which the molecule interacts. Near-infrared (NIR) spectroscopy has been proven to be powerful in analyzing the structure of water due to its sensitive response to OH. However, chemometrics is vitally important in the analysis of NIR spectrum of water due to the low resolution of the spectrum and the complexity of the water structures. In this review, chemometric methods for structural analysis of water in aqueous systems, particularly in chemical and biological processes, by NIR spectroscopy were summarized, from the improvement of spectral resolution to the effective extraction of the spectral information of different water structures. Through the changes of the spectral features of the water structures, the structural transformation of proteins, thermo-responsive polymers, antifreeze agents, as well as the structural variation of water in the transformation were elucidated. Water was proved to be a good probe for analyzing the structure and interactions in aqueous solutions and chemical/biological processes by NIR spectroscopy.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Tian, Jian Shen, Ao Wang, Zeqiu Li, Xiuhui Huang
{"title":"Data Augmentation and Fault Diagnosis for Imbalanced Industrial Process Data Based on Residual Wasserstein Generative Adversarial Network With Gradient Penalty","authors":"Ying Tian, Jian Shen, Ao Wang, Zeqiu Li, Xiuhui Huang","doi":"10.1002/cem.3624","DOIUrl":"10.1002/cem.3624","url":null,"abstract":"<div>\u0000 \u0000 <p>In practical industrial applications, equipment usually operates normally and failures are relatively rare, resulting in serious imbalances in the collected data. This imbalance leads to issues such as overfitting, instability, and poor robustness, significantly reducing the accuracy and stability of fault diagnosis system. To address these challenges, this research proposes a method for imbalanced data augmentation and industrial process fault diagnosis based on improved Generative Adversarial Network (GAN). The method adopts Wasserstein distance with gradient penalty and integrates residual connections into the architecture of the generator. This innovation not only helps improve gradient transfer in the generator, but also significantly enhances the data generation capabilities of the generative model through improving the stability of training. Limited industrial process data is used by a generative model to produce synthetic samples with high similarity and diversity. These high-quality samples improve fault diagnosis by enriching the imbalanced dataset. Experimental results on two industrial datasets confirm the method's effectiveness in enhancing fault diagnosis performance with limited data.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Past, Present and Future of Research in Analytical Figures of Merit","authors":"Alejandro Olivieri","doi":"10.1002/cem.3616","DOIUrl":"10.1002/cem.3616","url":null,"abstract":"","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 11","pages":""},"PeriodicalIF":2.1,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3616","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}