{"title":"VAE-SIMCA — Data-driven method for building one class classifiers with variational autoencoders","authors":"Akam Petersen, Sergey Kucheryavskiy","doi":"10.1016/j.chemolab.2024.105276","DOIUrl":"10.1016/j.chemolab.2024.105276","url":null,"abstract":"<div><div>The paper proposes a new method for building one class classifiers based on variational autoencoders (VAE). The classification decision is built on a linear combination of two squared distances: computed for the original and the reconstructed image as well as for the representation of the original image inside the latent space formed by VAE. Because both distances are well approximated by scaled chi-square distribution, the decision boundary is computed using the theoretical quantile function for this distribution and the predefined probability for Type I error, ⍺. Thereby the boundary does not require any specific optimization and is solely based on the model outcomes computed for the training set.</div><div>The original idea of the proposed method is inherited from another OCC approach, Data Driven Soft Independent Method for Class Analogies, where singular value decomposition is employed for building the latent space. In this paper we show how this idea can be adopted to be used with VAE for detection of anomalies on images. The paper describes the theoretical background, introduces the main outcomes as well as tools for visual exploration of the classification results, and shows how the method works on several simulated and real datasets.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"256 ","pages":"Article 105276"},"PeriodicalIF":3.7,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142719850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Catarina Rocha , Carla Palma , Ricardo J.N. Bettencourt da Silva
{"title":"Sound uncertainty-based strategy for oil spill source identification","authors":"Ana Catarina Rocha , Carla Palma , Ricardo J.N. Bettencourt da Silva","doi":"10.1016/j.chemolab.2024.105275","DOIUrl":"10.1016/j.chemolab.2024.105275","url":null,"abstract":"<div><div>Oil spills are frequent and a major environmental threat, whether they are massive or small. Therefore, authorities and experts have developed analytical chemistry tools to identify spill sources and address these illegal acts by comparing oil patterns obtained by Gas Chromatography-Mass Spectrometry analysis of the spill (Sp) and suspected sources (SS) samples. Several methodologies have proposed different balances between data processing complexity and reliability. Supported by the accessibility and validity of Microsoft Excel spreadsheets, an alternative, accurate, and user-friendly tool was developed for spill source identification based on Monte Carlo Method (MCM) simulation of correlated oil components expressed by abundance ratio (<em>DR</em>). However, the statistical control of various <em>DR</em> and the degree of similarity of samples' compositions, at defined confidence levels, impact the probability of true and false composition equivalence claim of Sp and SS becoming a challenge to recognise the offender. This work not only compares the MCM and the conventional approaches allowing to highlight the limitations that result in evidence with greater uncertainty, but also offers a statistically sound strategy that manages the probabilities of a compositional equivalence claim assessing the ability to distinguish competing spill sources and reporting the most likely polluting source with reduced uncertainty. A decision chart proposed, based on objective and statistically sound criteria, indicates the performance of consecutive <em>DR</em> comparison trials if necessary. The target values established for the probability of compositional equivalence claim of the Sp and the first and second most likely SS (≥95.0 % and ≤0.50 %, respectively) provide to forensic experts’ sound evidence to be presented in court (likelihood ratio ≥190). This work represents a significant breakthrough in comparing complex chemical oil patterns.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"256 ","pages":"Article 105275"},"PeriodicalIF":3.7,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142719851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A flame image soft sensor for oxygen content prediction based on denoising diffusion probabilistic model","authors":"Yi Liu , Angpeng Liu , Shuang Gao","doi":"10.1016/j.chemolab.2024.105269","DOIUrl":"10.1016/j.chemolab.2024.105269","url":null,"abstract":"<div><div>High-precision oxygen content measurement is crucial for statistical analysis of combustion chemical reaction. Deep learning based soft sensor is a new class of intelligent tools for monitoring combustion oxygen content. But in the actual production, data for sensors are often insufficient. A new soft sensing model is proposed to display the excellent performance of denoising diffusion probabilistic model (DDPM) in data generation. Firstly, a UNet based soft sensor is designed by integrating self-attention mechanism into the convolution layers. Then, a denoising loss function is designed to link the feature extraction process of soft sensor model with the reverse denoising process of DDPM, and the noise prediction neural network of DDPM is used to improve the feature extractability of the soft sensor model. Finally, the proposed model is compared with common models. The effectiveness and superiority of the proposed soft sensing model for oxygen content prediction, especially in the case with a small sample size, are both confirmed by the results.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105269"},"PeriodicalIF":3.7,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Yang , Lihao Yao , Fang Jia , Guiyuan Pang , Meiyu Huang , Chengxiang Liu , Hua Luo , Lili Fan
{"title":"Prediction of potential antitumor components in Ganoderma lucidum: A combined approach using machine learning and molecular docking","authors":"Qi Yang , Lihao Yao , Fang Jia , Guiyuan Pang , Meiyu Huang , Chengxiang Liu , Hua Luo , Lili Fan","doi":"10.1016/j.chemolab.2024.105271","DOIUrl":"10.1016/j.chemolab.2024.105271","url":null,"abstract":"<div><div>The objective of this study is to develop a reliable predictive model for antitumour activity and to identify potential antitumour components in <em>Ganoderma lucidum</em>. Four machine learning models, including Random Forest, were employed to train predictive models for antitumour activity, utilising Morgan fingerprints as molecular descriptors. The most effective model was then employed to predict the chemical composition of <em>Ganoderma lucidum</em>, identifying the four most probable compounds for molecular docking with known TNF-α-related targets. The findings of the study indicate that a Support Vector Machine (SVM) model exhibits an accuracy, F1 score, AUC, and sensitivity of 0.7638, 0.7638, 0.8332, and 0.7621, respectively. The model demonstrated an 80 % accuracy rate in predicting the antitumour activity of 10 FDA-approved drugs. Besides, the model identified 11 components in <em>Ganoderma lucidum</em>, including 3-nitroanisole, with a probability of antitumour activity exceeding 0.5, indicating their potential as antitumour agents. The results of the molecular docking procedure indicated that the four most promising antitumour compounds derived from <em>Ganoderma lucidum</em> exhibited a favourable binding affinity with the TNF-α target. In conclusion, this study incorporated a machine learning prediction step prior to molecular docking, thereby enhancing the reliability of the latter. Furthermore, it identified previously unreported compounds in <em>Ganoderma lucidum</em> with potential antitumour activity, such as 3-nitroanisole.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105271"},"PeriodicalIF":3.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spectra data calibration based on deep residual modeling of independent component regression","authors":"Junhua Zheng , Zeyu Yang , Zhiqiang Ge","doi":"10.1016/j.chemolab.2024.105270","DOIUrl":"10.1016/j.chemolab.2024.105270","url":null,"abstract":"<div><div>Independent component regression (ICR) has recently become quite popular in spectra data calibration, due to its advantages in non-Gaussian data modeling and high-order statistics feature extraction. Inspired by the idea of deep learning, this paper extends the basic ICR model to the deep form by introducing a layer-wise residual learning strategy. Based on the residual information generated from last layer of the deep learning model, more and more different patterns of independent components can be extracted layer-by-layer. Then, a further information compression step is taken to combine and also to condense those independent components obtained from different layers of the deep model. Two detailed benchmark case studies are implemented to evaluate the calibration performance of the develop model, based on which the effectiveness of both layer-by-layer component extraction and further information compression are well confirmed.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105270"},"PeriodicalIF":3.7,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chairul Ichsan , Navinda Ramadhan , Komang Gede Yudi Arsana , M. Mahfudz Fauzi Syamsuri , Rohmatullaili
{"title":"Enhanced CO2 leak detection in soil: High-fidelity digital colorimetry with machine learning and ACES AP0","authors":"Chairul Ichsan , Navinda Ramadhan , Komang Gede Yudi Arsana , M. Mahfudz Fauzi Syamsuri , Rohmatullaili","doi":"10.1016/j.chemolab.2024.105268","DOIUrl":"10.1016/j.chemolab.2024.105268","url":null,"abstract":"<div><div>The importance of effective carbon capture and storage (CCS) in addressing climate change issues highlights the need for robust CO<sub>2</sub> leak monitoring systems. Limitations of conventional methods have prompted interest in alternative approaches, such as optical CO<sub>2</sub> sensors, which offer non-invasive and continuous monitoring. Here, we present a novel methodology for high-fidelity digital colorimetry to enhance CO<sub>2</sub> leak detection in soil, integrating machine learning algorithms with the ACES AP0 color space. Optical CO<sub>2</sub> sensors, utilizing a cresol red-based detection solution, were calibrated and validated in a controlled environment chamber designed to simulate CO<sub>2</sub> leakage. Digital images of the sensor's colorimetric response to varying CO<sub>2</sub> levels were analyzed in five color spaces. The ACES AP0 color space, renowned for its expansive color gamut and perceptual uniformity, exhibited optimal performance in discerning subtle color variations induced by changes in CO<sub>2</sub> concentration. Ten machine learning regression models were evaluated, and Multivariate Polynomial Regression (MPR) emerged as the most effective in converting ACES AP0 color data into precise CO<sub>2</sub> concentration estimates, achieving a Mean Absolute Percentage Error (MAPE) of 2.9 % and a Root Mean Square Error (RMSE) of 0.0731. Field validation at a carbon capture and storage (CCS) facility corroborated the robustness and accuracy of this method, showcasing its potential for real-world applications in CCS and environmental monitoring.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105268"},"PeriodicalIF":3.7,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rong Fan , Abdul Rauf , Manal Elzain Mohamed Abdalla , Arif Nazir , Muhammad Faisal , Adnan Aslam
{"title":"Quantitative structure properties relationship (QSPR) analysis for physicochemical properties of nonsteroidal anti-inflammatory drugs (NSAIDs) usingVe degree-based reducible topological indices","authors":"Rong Fan , Abdul Rauf , Manal Elzain Mohamed Abdalla , Arif Nazir , Muhammad Faisal , Adnan Aslam","doi":"10.1016/j.chemolab.2024.105266","DOIUrl":"10.1016/j.chemolab.2024.105266","url":null,"abstract":"<div><div>Nonsteroidal Anti-Inflammatory Drugs (NSAIDs) are a class of medications that are used for different therapeutic uses. They effectively alleviate pain, reduce inflammation, and manage fever. These drugs are available in various forms. NSAIDs are prescribed by healthcare professionals to address a wide range of symptoms, from headaches and dental pain to conditions like arthritis and muscle stiffness. In this work, we use ve-degree-based reducible topological descriptors in quantitative structure-property relationship (QSPR) analysis to estimate the physicochemical properties of NSAIDs. In the first step, we have developed a MAPLE-based code to compute the reducible ve-degree-based topological descriptors of NSAIDs. Then, a linear regression model was used to estimate four physicochemical properties of seventy NSAIDs. It has been observed that two physicochemical properties, namely Molecular Weight and Complexity show a very strong correlation with the reducible ve-degree-based topological descriptors. For both cases, the value of correlation coefficient is greater than 0.9. Finally, quadratic and cubic regression models were constructed, and a comparative analysis with these models is presented. These results may help enhance the understanding of NSAIDs medication structures and aid in predicting their pharmacological activity.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105266"},"PeriodicalIF":3.7,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hugues Kouakou , José Henrique de Morais Goulart , Raffaele Vitale , Thomas Oberlin , David Rousseau , Cyril Ruckebusch , Nicolas Dobigeon
{"title":"On-the-fly spectral unmixing based on Kalman filtering","authors":"Hugues Kouakou , José Henrique de Morais Goulart , Raffaele Vitale , Thomas Oberlin , David Rousseau , Cyril Ruckebusch , Nicolas Dobigeon","doi":"10.1016/j.chemolab.2024.105252","DOIUrl":"10.1016/j.chemolab.2024.105252","url":null,"abstract":"<div><div>This work introduces an on-the-fly (i.e., online) linear spectral unmixing method which is able to sequentially analyze spectral data acquired on a spectrum-by-spectrum basis. After deriving a sequential counterpart of the conventional linear mixing model, the proposed approach recasts the linear unmixing problem into a linear state-space estimation framework. Under Gaussian noise and state models, the estimation of the pure spectra can be efficiently conducted by resorting to Kalman filtering. Interestingly, it is shown that this Kalman filter can operate in a lower-dimensional subspace to lighten the computational burden of the overall unmixing procedure. Experimental results obtained on synthetic and real Raman data sets show that this Kalman filter-based method offers a convenient trade-off between unmixing accuracy and computational efficiency, which is crucial for operating in an on-the-fly setting. The proposed method constitutes a valuable building block for benefiting from acquisition and processing frameworks recently proposed in the microscopy literature, which are motivated by practical issues such as reducing acquisition time and avoiding potential damages being inflicted to photosensitive samples. The code associated with the numerical illustrations reported in this paper is freely available online at <span><span>https://github.com/HKouakou/KF-OSU</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105252"},"PeriodicalIF":3.7,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francisco de Asis López , Javier Roca-Pardiñas , Celestino Ordóñez
{"title":"Regression analysis with spatially-varying coefficients using generalized additive models (GAMs)","authors":"Francisco de Asis López , Javier Roca-Pardiñas , Celestino Ordóñez","doi":"10.1016/j.chemolab.2024.105254","DOIUrl":"10.1016/j.chemolab.2024.105254","url":null,"abstract":"<div><div>Regression models for spatial data have attracted the attention of researchers from different fields given their widespread application. In this work we analyze the utility of generalized additive models (GAMs) as regression methods with spatially-dependent coefficients. Particularly, three different aspects of the regression analysis were addressed: model definition and estimation, testing spatial heterogeneity, and variable selection. Spatial heterogeneity was addressed through bootstrapping, while and algorithm using the Bayesian Information Criterion (BIC) was implemented for variable selection to reduce computation time. In addition, this study makes a comparison of GAMs with two of the most common methods for regression with spatially-varying coefficients: Geographically Weighted Regression (GWR) and Multiscale Geographically Weighted Regression (MGWR), using both synthetic and real data.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105254"},"PeriodicalIF":3.7,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Luiza de Godoy Bertanha, Felipe Rebello Lourenço
{"title":"Impact of metrological correlation on the total combined risk in pharmaceutical equivalence evaluations","authors":"Maria Luiza de Godoy Bertanha, Felipe Rebello Lourenço","doi":"10.1016/j.chemolab.2024.105267","DOIUrl":"10.1016/j.chemolab.2024.105267","url":null,"abstract":"<div><div>Pharmaceutical equivalence evaluation requires a multiparametric conformity assessment for both generic and reference medicines. This paper investigates the impact of metrological correlations on the total combined risk in pharmaceutical equivalence evaluations. The study focused on the equivalence between ranitidine hydrochloride tablets, assessed by determining the average weight, the assay of the active pharmaceutical ingredient, and the uniformity of dosage units. The risks of false conformity decisions were evaluated using Monte Carlo method simulations across four scenarios, each reflecting different correlation conditions. The results of the study focus on evaluating pharmaceutical equivalence between ranitidine hydrochloride tablets from two manufacturers. The tablets were tested for three parameters: average weight, active pharmaceutical ingredient (API) assay, and uniformity of dosage units. The measured values were within the regulatory specifications for both medicines A and B. Four scenarios of metrological correlation were assessed: #1 – actual correlation from shared analytical steps, #2 – correlation between parameters within the same medicine, #3 – correlation between generic and reference medicines, and #4 – uncorrelated parameters. The study revealed that correlations significantly affect total and combined risk values. The correlations between different parameters of the same medicine affect the total risk values, while the correlations between generic and reference medicines for a given parameter influence the combined particular risk values. Correlations between parameters of the same medicine affect total risk values, while correlations between generic and reference medicines impact combined particular risk values. Both types of correlations significantly influence combined total risk values, making metrological correlations crucial in pharmaceutical equivalence evaluations. Proper consideration of these correlations ensures the quality, efficacy, and safety of generic and reference medicines.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105267"},"PeriodicalIF":3.7,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}