{"title":"Automation of Local Regression Model Building for Spectroscopic Data","authors":"Randy J. Pell, L. Scott Ramos, Brian Rohrback","doi":"10.1002/cem.3637","DOIUrl":"https://doi.org/10.1002/cem.3637","url":null,"abstract":"<div>\u0000 \u0000 <p>The concept of using local calibration for spectroscopic analysis has been discussed since the late 1980s. Since that time, many papers have described modifications to different aspects of the local modeling methodology. In this paper, we briefly discuss some of the modifications and describe an approach for the unattended automation of local model development. Ways to reduce calculation time are discussed. Four example spectroscopic datasets using Raman, FT-NIR, and dispersive NIR are analyzed, and the local model prediction performance is compared to standard PLS prediction performance. Using independent prediction sets, local modeling is shown to improve prediction performance by 17% to 55%.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143113197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanneke Leegwater, Zhengzheng Zhang, Xiaobing Zhang, Thomas Hankemeier, Amy C. Harms, Annelien J. M. Zweemer, Sylvia E. Le Dévédec, Alida Kindt
{"title":"Normalization Strategies for Lipidome Data in Cell Line Panels","authors":"Hanneke Leegwater, Zhengzheng Zhang, Xiaobing Zhang, Thomas Hankemeier, Amy C. Harms, Annelien J. M. Zweemer, Sylvia E. Le Dévédec, Alida Kindt","doi":"10.1002/cem.3636","DOIUrl":"https://doi.org/10.1002/cem.3636","url":null,"abstract":"<p>Sample collection can significantly affect lipid concentration measurements in cell line panels, concealing intrinsic differences between cancer subtypes. Most quality control steps in lipidomic data analysis focus on controlling technical variation. Correcting for the total amount of biological material remains an additional challenge for cell line panels. Here, we investigated how we can normalize lipidomic data acquired from multiple cell lines to correct for differences in sample biomass. We studied how commonly used data normalization and transformation strategies influence the resulting lipid data distributions. We compared normalization by biological properties including cell count and total protein concentration, to statistical and data-based approaches, such as median, mean, or probabilistic quotient-based normalization. We used intraclass correlations to estimate how normalization influenced the similarity between replicates. Normalizing lipidomic data by cell count improved the similarity between replicates but only for cell lines with similar morphologies. When comparing cell line panels with diverse morphologies neither cell count nor protein concentration was sufficient to increase the similarity of lipid abundances between cell line replicates. Data-based normalizations increased these similarities but resulted in a bias towards the large and variable lipid class of triglycerides. These artifacts are reduced by normalizing for the abundance of only structural lipids. We conclude that there is a delicate balance between improving the similarity between replicates and avoiding artifacts in lipidomic data and emphasize the importance of an appropriate normalization strategy in studying biological phenomena using lipidomics.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3636","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143112892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paul-Albert Schneide, Michael Sorochan Armstrong, Neal Gallagher, Rasmus Bro
{"title":"Unlocking New Capabilities in the Analysis of GC × GC-TOFMS Data With Shift-Invariant Multi-Linearity","authors":"Paul-Albert Schneide, Michael Sorochan Armstrong, Neal Gallagher, Rasmus Bro","doi":"10.1002/cem.3623","DOIUrl":"https://doi.org/10.1002/cem.3623","url":null,"abstract":"<div>\u0000 \u0000 <p>This paper introduces a novel deconvolution algorithm, shift-invariant multi-linearity (SIML), which significantly enhances the analysis of data from two-dimensional gas chromatography instruments coupled to a time-of-flight mass spectrometer (GC × GC-TOFMS). Designed to address the challenges posed by retention time shifts and high noise levels, SIML incorporates wavelet-based smoothing and Fourier-transform based shift-correction within the multivariate curve resolution-alternating least squares (MCR-ALS) framework. We benchmarked the SIML algorithm against non-negativity constrained MCR-ALS and parallel factor analysis 2 with flexible coupling (PARAFAC2 × N) using both simulated and real GC × GC-TOFMS datasets. Our results demonstrate that SIML provides unique solutions with significantly improved robustness, particularly in low signal-to-noise ratio scenarios, where it maintains high accuracy in estimating mass spectra and concentrations. The enhanced reliability of quantitative analyses afforded by SIML underscores its potential for broad application in complex matrix analyses across environmental science, food science, and biological research.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143112315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Feature Variable Selection for Near-Infrared Spectroscopy Based on Simulated Annealing Bee Colony Algorithm","authors":"Jianfei Shi, Baihong Tong, Jinming Liu, Zhengguang Chen, Pengfei Li, Chong Tan","doi":"10.1002/cem.3633","DOIUrl":"https://doi.org/10.1002/cem.3633","url":null,"abstract":"<div>\u0000 \u0000 <p>Variable selection is an effective method to enhance the modeling performance of near-infrared spectroscopy. Given the promising application prospects of intelligent optimization algorithms in spectral feature variable selection, this article combines the artificial bee colony algorithm with the simulated annealing algorithm to construct a simulated annealing bee colony algorithm (SABC). To explore the feasibility of SABC for spectral variable selection, SABC was applied to construct a partial least squares spectral quantitative detection model for corn stover cellulose and soil organic matter contents. The modeling performance was compared with that of the full spectrum, genetic algorithm, simulated annealing algorithm, and artificial bee colony algorithm; it was found that the model regression precision established by SABC was the best. For the cellulose and organic matter content detection models, the coefficients of determination of the validation set were 0.9433 and 0.9853, with the relative root mean squared error of 1.7901% and 0.8011%, and the residual prediction deviation of 4.1741 and 8.3931, respectively, which could meet the corresponding actual detection needs. SABC adopted the strategy of multiple runs to select the repeated wavelength variables, effectively reduced variable dimensions and model complexity, improved the prediction performance of the regression model, and provided a new approach for building a high-performance near-infrared spectroscopy (NIRS) quantitative calibration model.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 1","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143119265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Principal Components Analysis: Centring and Rotations","authors":"Richard G. Brereton","doi":"10.1002/cem.3610","DOIUrl":"https://doi.org/10.1002/cem.3610","url":null,"abstract":"<p>It is rare to perform PCA on raw data, and usually some transformation or pre-processing is performed prior to PCA.</p><p>We will illustrate the principles of centring using four 6 × 2 datasets numbered 1 to 4 as presented in Table 1, consisting of objects A to F, and will use primarily graphical approaches. As many readers will choose to centre data prior to PCA, having a good understanding is useful. Readers should be able to reproduce corresponding numerical results using your favourite package but some of the graphics might not be available in most software.</p><p>So far, this article has primarily been about geometry, but what are the practical consequences? The first and most obvious is that centring may usually change the patterns of the scores. When there are more than two components, rotations can be very complicated algebraically, but we will not have the room in this introductory article to discuss this in detail, but a 3D rotation, for example, is often expressed as three separate 2D rotations around each of the axes [<span>5, 6</span>]. Changes in signs can be viewed as reflections in hyperplanes. The directions of PC axes can vary considerably according to whether the data are centred and often show much greater deviation than in these simple two dimensional examples.</p><p>As such, centring is not straightforward to visualise and understand when the number of variables is large. Whether it is appropriate to centre does depend on the questions being asked. For example, in spectroscopy of mixtures, we may be primarily interested in the properties of individual chemical components above a baseline, so centring might not be appropriate. In many areas, we are interested in variability around a mean and so it is best to centre the data. This article discusses the influence of column centring on PC scores.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3610","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Max Feinberg, Stephan Clémençon, Serge Rudaz, Julien Boccard
{"title":"Kernel-Based Bootstrap Synthetic Data to Estimate Measurement Uncertainty in Analytical Sciences","authors":"Max Feinberg, Stephan Clémençon, Serge Rudaz, Julien Boccard","doi":"10.1002/cem.3628","DOIUrl":"https://doi.org/10.1002/cem.3628","url":null,"abstract":"<p>Measurement uncertainty (MU) is becoming a key figure of merit for analytical methods, and estimating MU from method validation data is cost-effective and practical. Since MU can be defined as a coverage interval of a given result, the computation of statistical prediction intervals is a possible approach, but the quality of the intervals is questionable when the number of available data is reduced. In this context, the bootstrap procedure constitutes an efficient strategy to increase the observed data variability. While applying naive bootstrap to validation data raises some computational challenges, the use of smooth bootstrap is much more interesting when synthetic data are generated using an adapted kernel density estimation algorithm. MU can be directly obtained in a very convenient way as an uncertainty function applicable to any unknown future measurement. This publication presents the advantages and disadvantages of this new method illustrated using diverse in-house and interlaboratory validation data.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3628","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rapid Detection of Stabilizer Content in Double-Base Propellant Based on Artificial Neural Network Combined With Near-Infrared Spectroscopy","authors":"Dihua Ouyang, Tianyu Cui, Qiantao Zhang, Haoxiang Dai, Xiaowen Qin, Yaoli Hu","doi":"10.1002/cem.3632","DOIUrl":"https://doi.org/10.1002/cem.3632","url":null,"abstract":"<div>\u0000 \u0000 <p>During long-term storage, double-base propellants are prone to chemical decomposition of internal nitrate esters, leading to decreased burn rate, reduced strength, and degraded ballistic performance. Adding an appropriate amount of Centralite-II is crucial for ensuring storage safety. This study proposes a novel method combining near-infrared spectroscopy (NIRS) with artificial intelligence to rapidly and non-destructively detect the content of Centralite-II in double-base propellants. The optimal modeling wavelength ranges of 4000–4600 cm<sup>−1</sup> and 5700–6100 cm<sup>−1</sup> were identified, and the raw spectral data were preprocessed using standard normal variate (SNV) transformation to improve the signal-to-noise ratio. Principal component analysis (PCA) was then applied to reduce data dimensionality, and the first three principal components were used as inputs for a backpropagation (BP-ANN) neural network. The resulting PCA-BP-ANN model showed excellent performance on the training set, with an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msubsup>\u0000 <mi>R</mi>\u0000 <mi>c</mi>\u0000 <mn>2</mn>\u0000 </msubsup>\u0000 </mrow>\u0000 <annotation>$$ {R}_c&#x0005E;2 $$</annotation>\u0000 </semantics></math> of 0.9830 and an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mtext>RMSEC</mtext>\u0000 </mrow>\u0000 <annotation>$$ RMSEC $$</annotation>\u0000 </semantics></math> of 0.0376%. During independent validation, the model demonstrated strong generalization ability, achieving an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <msubsup>\u0000 <mi>R</mi>\u0000 <mi>p</mi>\u0000 <mn>2</mn>\u0000 </msubsup>\u0000 </mrow>\u0000 <annotation>$$ {R}_p&#x0005E;2 $$</annotation>\u0000 </semantics></math> of 0.9824 and an \u0000<span></span><math>\u0000 <semantics>\u0000 <mrow>\u0000 <mtext>RMSEP</mtext>\u0000 </mrow>\u0000 <annotation>$$ RMSEP $$</annotation>\u0000 </semantics></math> of 0.3179%, comparative analysis with other models, including BP, PLS, ELM, SVR, and LSTM, indicated that the PCA-BP-ANN model exhibited superior prediction accuracy and generalization capability. This method provides a rapid and non-destructive approach for assessing the stabilizer content in double-base propellants and expands the application of NIRS and AI techniques in the field of energetic materials.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Process Fault Detection and Diagnosis Method Based on Factor Analysis: Application on the Three-Tank System Process","authors":"Cheng Zhang, Ze-hao Xu, Yu-yu Lao, Yuan Li","doi":"10.1002/cem.3627","DOIUrl":"https://doi.org/10.1002/cem.3627","url":null,"abstract":"<div>\u0000 \u0000 <p>To address the issue of underreporting faults in the detection of tiny faults by dynamic factor analysis (DFA), a novel fault detection and diagnosis method based on DFA-sliding window combined with mean square error (DFA-SWMSE) is proposed. Firstly, the data matrix is augmented by introducing time lag shifts. Secondly, factor analysis (FA) is applied to the augmented data matrix, achieving dimensionality reduction and feature extraction while retaining most of the original data's information. Then, the sliding window technique is applied to calculate the mean square error of the dimensionally reduced data, allowing for the monitoring of the system's current state and the detection of tiny faults. Finally, effective fault diagnosis is achieved through the analysis of fault factors and variable contributions. The proposed method is validated using a complex dynamic numerical example and a three-tank system process named Sim3Tanks. This system has gained widespread application in the field of process fault detection due to its ability to simulate and generate various types of faults. The proposed method is compared with principal component analysis (PCA), dynamic principal component analysis (DPCA), PCA similarity factor (SPCA), FA, and DFA. The experimental results thoroughly validate the effectiveness of the proposed method in detecting and diagnosing tiny faults in dynamic processes.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tuomas Korpinsalo, Juhana Rautavirta, Sami Huhtala, Tapani Reinikainen, Jukka Corander
{"title":"Forensic Comparison of Amphetamine Chemical Profiles by Bayesian Predictive Modelling","authors":"Tuomas Korpinsalo, Juhana Rautavirta, Sami Huhtala, Tapani Reinikainen, Jukka Corander","doi":"10.1002/cem.3630","DOIUrl":"https://doi.org/10.1002/cem.3630","url":null,"abstract":"<div>\u0000 \u0000 <p>Forensic chemists frequently employ statistical profiling approaches to assess the degree of similarity between samples of illicit drugs. Such profiling information can help reveal connections between nodes in distribution networks and manufacturing laboratories. For amphetamine, the routine method of comparing a pair of samples includes the use of a dissimilarity measure based on the Pearson correlation coefficient calculated between their chemical profiles obtained through gas chromatography–mass spectrometry. This simple measure of (dis)similarity has been shown distinguish pairs sharing a common origin (e.g., same production batch) to a reasonable level of accuracy. However, Pearson correlation fails to capture all the relevant notions of similarity between chemical profiles of amphetamine. We present a new statistical method for forensic drug comparison that uses a more sophisticated statistical modelling approach to determine similarity between samples. We show that this leads to improved performance over the correlation-based approach. The proposed method is easily extendable and has an intuitive interpretation both from chemistry and forensic perspectives, which supports wide applicability to illicit drug profiling in practice.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142861077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chemometrics: A Vital Implement for Understanding the Water Structures by Near-Infrared Spectroscopy","authors":"Haipeng Wang, Li Han, Wensheng Cai, Xueguang Shao","doi":"10.1002/cem.3631","DOIUrl":"https://doi.org/10.1002/cem.3631","url":null,"abstract":"<div>\u0000 \u0000 <p>Water structures take an important role in chemical and biological systems, because the structure and function of a molecule may depend on the structure of water with which the molecule interacts. Near-infrared (NIR) spectroscopy has been proven to be powerful in analyzing the structure of water due to its sensitive response to OH. However, chemometrics is vitally important in the analysis of NIR spectrum of water due to the low resolution of the spectrum and the complexity of the water structures. In this review, chemometric methods for structural analysis of water in aqueous systems, particularly in chemical and biological processes, by NIR spectroscopy were summarized, from the improvement of spectral resolution to the effective extraction of the spectral information of different water structures. Through the changes of the spectral features of the water structures, the structural transformation of proteins, thermo-responsive polymers, antifreeze agents, as well as the structural variation of water in the transformation were elucidated. Water was proved to be a good probe for analyzing the structure and interactions in aqueous solutions and chemical/biological processes by NIR spectroscopy.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142860739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}