{"title":"Principal Components Analysis: Centring and Rotations","authors":"Richard G. Brereton","doi":"10.1002/cem.3610","DOIUrl":"https://doi.org/10.1002/cem.3610","url":null,"abstract":"<p>It is rare to perform PCA on raw data, and usually some transformation or pre-processing is performed prior to PCA.</p><p>We will illustrate the principles of centring using four 6 × 2 datasets numbered 1 to 4 as presented in Table 1, consisting of objects A to F, and will use primarily graphical approaches. As many readers will choose to centre data prior to PCA, having a good understanding is useful. Readers should be able to reproduce corresponding numerical results using your favourite package but some of the graphics might not be available in most software.</p><p>So far, this article has primarily been about geometry, but what are the practical consequences? The first and most obvious is that centring may usually change the patterns of the scores. When there are more than two components, rotations can be very complicated algebraically, but we will not have the room in this introductory article to discuss this in detail, but a 3D rotation, for example, is often expressed as three separate 2D rotations around each of the axes [<span>5, 6</span>]. Changes in signs can be viewed as reflections in hyperplanes. The directions of PC axes can vary considerably according to whether the data are centred and often show much greater deviation than in these simple two dimensional examples.</p><p>As such, centring is not straightforward to visualise and understand when the number of variables is large. Whether it is appropriate to centre does depend on the questions being asked. For example, in spectroscopy of mixtures, we may be primarily interested in the properties of individual chemical components above a baseline, so centring might not be appropriate. In many areas, we are interested in variability around a mean and so it is best to centre the data. This article discusses the influence of column centring on PC scores.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3610","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Max Feinberg, Stephan Clémençon, Serge Rudaz, Julien Boccard
{"title":"Kernel-Based Bootstrap Synthetic Data to Estimate Measurement Uncertainty in Analytical Sciences","authors":"Max Feinberg, Stephan Clémençon, Serge Rudaz, Julien Boccard","doi":"10.1002/cem.3628","DOIUrl":"https://doi.org/10.1002/cem.3628","url":null,"abstract":"<p>Measurement uncertainty (MU) is becoming a key figure of merit for analytical methods, and estimating MU from method validation data is cost-effective and practical. Since MU can be defined as a coverage interval of a given result, the computation of statistical prediction intervals is a possible approach, but the quality of the intervals is questionable when the number of available data is reduced. In this context, the bootstrap procedure constitutes an efficient strategy to increase the observed data variability. While applying naive bootstrap to validation data raises some computational challenges, the use of smooth bootstrap is much more interesting when synthetic data are generated using an adapted kernel density estimation algorithm. MU can be directly obtained in a very convenient way as an uncertainty function applicable to any unknown future measurement. This publication presents the advantages and disadvantages of this new method illustrated using diverse in-house and interlaboratory validation data.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3628","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rapid Detection of Stabilizer Content in Double-Base Propellant Based on Artificial Neural Network Combined With Near-Infrared Spectroscopy","authors":"Dihua Ouyang, Tianyu Cui, Qiantao Zhang, Haoxiang Dai, Xiaowen Qin, Yaoli Hu","doi":"10.1002/cem.3632","DOIUrl":"https://doi.org/10.1002/cem.3632","url":null,"abstract":"<div>\u0000 \u0000 <p>During long-term storage, double-base propellants are prone to chemical decomposition of internal nitrate esters, leading to decreased burn rate, reduced strength, and degraded ballistic performance. Adding an appropriate amount of Centralite-II is crucial for ensuring storage safety. This study proposes a novel method combining near-infrared spectroscopy (NIRS) with artificial intelligence to rapidly and non-destructively detect the content of Centralite-II in double-base propellants. The optimal modeling wavelength ranges of 4000–4600 cm<sup>−1</sup> and 5700–6100 cm<sup>−1</sup> were identified, and the raw spectral data were preprocessed using standard normal variate (SNV) transformation to improve the signal-to-noise ratio. Principal component analysis (PCA) was then applied to reduce data dimensionality, and the first three principal components were used as inputs for a backpropagation (BP-ANN) neural network. The resulting PCA-BP-ANN model showed excellent performance on the training set, with an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msubsup>\u0000 <mi>R</mi>\u0000 <mi>c</mi>\u0000 <mn>2</mn>\u0000 </msubsup>\u0000 </mrow>\u0000 <annotation>$$ {R}_c&#x0005E;2 $$</annotation>\u0000 </semantics></math> of 0.9830 and an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mtext>RMSEC</mtext>\u0000 </mrow>\u0000 <annotation>$$ RMSEC $$</annotation>\u0000 </semantics></math> of 0.0376%. During independent validation, the model demonstrated strong generalization ability, achieving an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msubsup>\u0000 <mi>R</mi>\u0000 <mi>p</mi>\u0000 <mn>2</mn>\u0000 </msubsup>\u0000 </mrow>\u0000 <annotation>$$ {R}_p&#x0005E;2 $$</annotation>\u0000 </semantics></math> of 0.9824 and an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mtext>RMSEP</mtext>\u0000 </mrow>\u0000 <annotation>$$ RMSEP $$</annotation>\u0000 </semantics></math> of 0.3179%, comparative analysis with other models, including BP, PLS, ELM, SVR, and LSTM, indicated that the PCA-BP-ANN model exhibited superior prediction accuracy and generalization capability. This method provides a rapid and non-destructive approach for assessing the stabilizer content in double-base propellants and expands the application of NIRS and AI techniques in the field of energetic materials.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Process Fault Detection and Diagnosis Method Based on Factor Analysis: Application on the Three-Tank System Process","authors":"Cheng Zhang, Ze-hao Xu, Yu-yu Lao, Yuan Li","doi":"10.1002/cem.3627","DOIUrl":"https://doi.org/10.1002/cem.3627","url":null,"abstract":"<div>\u0000 \u0000 <p>To address the issue of underreporting faults in the detection of tiny faults by dynamic factor analysis (DFA), a novel fault detection and diagnosis method based on DFA-sliding window combined with mean square error (DFA-SWMSE) is proposed. Firstly, the data matrix is augmented by introducing time lag shifts. Secondly, factor analysis (FA) is applied to the augmented data matrix, achieving dimensionality reduction and feature extraction while retaining most of the original data's information. Then, the sliding window technique is applied to calculate the mean square error of the dimensionally reduced data, allowing for the monitoring of the system's current state and the detection of tiny faults. Finally, effective fault diagnosis is achieved through the analysis of fault factors and variable contributions. The proposed method is validated using a complex dynamic numerical example and a three-tank system process named Sim3Tanks. This system has gained widespread application in the field of process fault detection due to its ability to simulate and generate various types of faults. The proposed method is compared with principal component analysis (PCA), dynamic principal component analysis (DPCA), PCA similarity factor (SPCA), FA, and DFA. The experimental results thoroughly validate the effectiveness of the proposed method in detecting and diagnosing tiny faults in dynamic processes.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tuomas Korpinsalo, Juhana Rautavirta, Sami Huhtala, Tapani Reinikainen, Jukka Corander
{"title":"Forensic Comparison of Amphetamine Chemical Profiles by Bayesian Predictive Modelling","authors":"Tuomas Korpinsalo, Juhana Rautavirta, Sami Huhtala, Tapani Reinikainen, Jukka Corander","doi":"10.1002/cem.3630","DOIUrl":"https://doi.org/10.1002/cem.3630","url":null,"abstract":"<div>\u0000 \u0000 <p>Forensic chemists frequently employ statistical profiling approaches to assess the degree of similarity between samples of illicit drugs. Such profiling information can help reveal connections between nodes in distribution networks and manufacturing laboratories. For amphetamine, the routine method of comparing a pair of samples includes the use of a dissimilarity measure based on the Pearson correlation coefficient calculated between their chemical profiles obtained through gas chromatography–mass spectrometry. This simple measure of (dis)similarity has been shown distinguish pairs sharing a common origin (e.g., same production batch) to a reasonable level of accuracy. However, Pearson correlation fails to capture all the relevant notions of similarity between chemical profiles of amphetamine. We present a new statistical method for forensic drug comparison that uses a more sophisticated statistical modelling approach to determine similarity between samples. We show that this leads to improved performance over the correlation-based approach. The proposed method is easily extendable and has an intuitive interpretation both from chemistry and forensic perspectives, which supports wide applicability to illicit drug profiling in practice.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chemometrics: A Vital Implement for Understanding the Water Structures by Near-Infrared Spectroscopy","authors":"Haipeng Wang, Li Han, Wensheng Cai, Xueguang Shao","doi":"10.1002/cem.3631","DOIUrl":"https://doi.org/10.1002/cem.3631","url":null,"abstract":"<div>\u0000 \u0000 <p>Water structures take an important role in chemical and biological systems, because the structure and function of a molecule may depend on the structure of water with which the molecule interacts. Near-infrared (NIR) spectroscopy has been proven to be powerful in analyzing the structure of water due to its sensitive response to OH. However, chemometrics is vitally important in the analysis of NIR spectrum of water due to the low resolution of the spectrum and the complexity of the water structures. In this review, chemometric methods for structural analysis of water in aqueous systems, particularly in chemical and biological processes, by NIR spectroscopy were summarized, from the improvement of spectral resolution to the effective extraction of the spectral information of different water structures. Through the changes of the spectral features of the water structures, the structural transformation of proteins, thermo-responsive polymers, antifreeze agents, as well as the structural variation of water in the transformation were elucidated. Water was proved to be a good probe for analyzing the structure and interactions in aqueous solutions and chemical/biological processes by NIR spectroscopy.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Tian, Jian Shen, Ao Wang, Zeqiu Li, Xiuhui Huang
{"title":"Data Augmentation and Fault Diagnosis for Imbalanced Industrial Process Data Based on Residual Wasserstein Generative Adversarial Network With Gradient Penalty","authors":"Ying Tian, Jian Shen, Ao Wang, Zeqiu Li, Xiuhui Huang","doi":"10.1002/cem.3624","DOIUrl":"https://doi.org/10.1002/cem.3624","url":null,"abstract":"<div>\u0000 \u0000 <p>In practical industrial applications, equipment usually operates normally and failures are relatively rare, resulting in serious imbalances in the collected data. This imbalance leads to issues such as overfitting, instability, and poor robustness, significantly reducing the accuracy and stability of fault diagnosis system. To address these challenges, this research proposes a method for imbalanced data augmentation and industrial process fault diagnosis based on improved Generative Adversarial Network (GAN). The method adopts Wasserstein distance with gradient penalty and integrates residual connections into the architecture of the generator. This innovation not only helps improve gradient transfer in the generator, but also significantly enhances the data generation capabilities of the generative model through improving the stability of training. Limited industrial process data is used by a generative model to produce synthetic samples with high similarity and diversity. These high-quality samples improve fault diagnosis by enriching the imbalanced dataset. Experimental results on two industrial datasets confirm the method's effectiveness in enhancing fault diagnosis performance with limited data.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Past, Present and Future of Research in Analytical Figures of Merit","authors":"Alejandro Olivieri","doi":"10.1002/cem.3616","DOIUrl":"https://doi.org/10.1002/cem.3616","url":null,"abstract":"","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 11","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3616","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142665740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peiyuan Li, Tao Shen, Shaobing Yang, Zhitian Zuo, Yuanzhong Wang, Qiang Hu
{"title":"Characterization of Chemical Information and Content Prediction of Dendrobium officinale Based on ATR-FTIR","authors":"Peiyuan Li, Tao Shen, Shaobing Yang, Zhitian Zuo, Yuanzhong Wang, Qiang Hu","doi":"10.1002/cem.3626","DOIUrl":"https://doi.org/10.1002/cem.3626","url":null,"abstract":"<div>\u0000 \u0000 <p><i>Dendrobium officinale</i> is a medicinal and food plant with high commercial and medicinal value. Yunnan is known as China's “plant kingdom,” and although the climatic conditions are favorable, the large vertical climatic differences have led to a large difference in the quality of dendrobium from different origins. The analysis of quality differences between several origins with large ecological advantages has not been reported yet. Therefore, the aim of this study is to compare these regions in terms of both morphology and chemical composition and to analyze the variation of their chemical composition in spectral information. The PLS-DA, SVM, and PLSR models were developed to qualitatively and quantitatively evaluate <i>Dendrobium</i> from different production areas. The results show that the Menghai production area was superior to other production areas in terms of phenotypic morphology, quality, and yield. Within the appropriate range, the higher the specific absorbance, the higher the quercetin content.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Three-Way Data Reduction Based on Essential Information","authors":"Raffaele Vitale, Azar Azizi, Mahdiyeh Ghaffari, Nematollah Omidikia, Cyril Ruckebusch","doi":"10.1002/cem.3617","DOIUrl":"https://doi.org/10.1002/cem.3617","url":null,"abstract":"<p>In this article, the idea of essential information-based compression is extended to trilinear datasets. This basically boils down to identifying and labelling the essential rows (ERs), columns (ECs) and tubes (ETs) of such three-dimensional datasets that allow by themselves to reconstruct in a linear way the entire space of the original measurements. ERs, ECs and ETs can be determined by exploiting convex geometry computational approaches such as convex hull or convex polytope estimations and can be used to generate a reduced version of the data at hand. These compressed data and their uncompressed counterpart share the same multilinear properties and their factorisation (carried out by means of, for example, parallel factor analysis–alternating least squares [PARAFAC-ALS]) yield, in principle, indistinguishable results. More in detail, an algorithm for the assessment and extraction of the essential information encoded in trilinear data structures is here proposed. Its performance was evaluated in both real-world and simulated scenarios which permitted to highlight the benefits that this novel data reduction strategy can bring in domains like multiway fluorescence spectroscopy and imaging.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3617","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}