Jianmin Li , Tian Zhao , Qin Yang , Shijie Du , Lu Xu
{"title":"A review of quantitative structure-activity relationship: The development and current status of data sets, molecular descriptors and mathematical models","authors":"Jianmin Li , Tian Zhao , Qin Yang , Shijie Du , Lu Xu","doi":"10.1016/j.chemolab.2024.105278","DOIUrl":"10.1016/j.chemolab.2024.105278","url":null,"abstract":"<div><div>Developing Quantitative Structure-Activity Relationship (QSAR) models applicable to general molecules is of great significance for molecular design in many disciplines. This paper reviews the development and current status of molecular QSAR research, including datasets, molecular descriptors, and mathematical models. A representative bibliometric analysis reveals the evolutionary trends in this field in the past decade. Based on the discussion of the advantages and shortcomings of existing methods, the requirements and possible approaches for developing a widely applicable QSAR model were put forward. This goal poses a series of challenges to QSAR, including: (1) Having a sufficient number of structure-activity relationship instances as training data to cope with the complexity and diversity of molecular structures and action mechanisms; (2) Developing and using precise molecular descriptors to avoid the situation of ‘garbage in, garbage out’, while balancing descriptor dimensions and computational costs; and (3) Using powerful and flexible mathematical models, such as deep learning models, to learn complex functional relationships between descriptors and activity. With the emergence of larger and higher-quality data sets, more accurate molecular descriptors and deep learning methods, predictive ability, interpretability and application domain of QSAR models will continue to improve, and it will play a more important role in various fields of molecular design.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"256 ","pages":"Article 105278"},"PeriodicalIF":3.7,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142719740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VAE-SIMCA — Data-driven method for building one class classifiers with variational autoencoders","authors":"Akam Petersen, Sergey Kucheryavskiy","doi":"10.1016/j.chemolab.2024.105276","DOIUrl":"10.1016/j.chemolab.2024.105276","url":null,"abstract":"<div><div>The paper proposes a new method for building one class classifiers based on variational autoencoders (VAE). The classification decision is built on a linear combination of two squared distances: computed for the original and the reconstructed image as well as for the representation of the original image inside the latent space formed by VAE. Because both distances are well approximated by scaled chi-square distribution, the decision boundary is computed using the theoretical quantile function for this distribution and the predefined probability for Type I error, ⍺. Thereby the boundary does not require any specific optimization and is solely based on the model outcomes computed for the training set.</div><div>The original idea of the proposed method is inherited from another OCC approach, Data Driven Soft Independent Method for Class Analogies, where singular value decomposition is employed for building the latent space. In this paper we show how this idea can be adopted to be used with VAE for detection of anomalies on images. The paper describes the theoretical background, introduces the main outcomes as well as tools for visual exploration of the classification results, and shows how the method works on several simulated and real datasets.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"256 ","pages":"Article 105276"},"PeriodicalIF":3.7,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142719850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xianquan Wang , Fengjing Nie , Zhan Gao , Guoliang Li , Dengshuai Zhang , Jinfeng Zhang , Peijian Zhang
{"title":"Studies on QSAR models for the anti-virus effect of oseltamivir derivatives targeting H5N1 based on Mix-Kernel support vector machine","authors":"Xianquan Wang , Fengjing Nie , Zhan Gao , Guoliang Li , Dengshuai Zhang , Jinfeng Zhang , Peijian Zhang","doi":"10.1016/j.chemolab.2024.105273","DOIUrl":"10.1016/j.chemolab.2024.105273","url":null,"abstract":"<div><div>To assist in design and synthesis of novel oseltamivir derivatives as to help yield more inhibitors with lower time and cost, four quantitative structure-activity relationship (QSAR) models on 83 ODs with methods of random forest (RF), extreme gradient boosting (XGBoost), radial basis kernel function support vector machine (RBF-SVM) and mix kernel function support vector machine (MIX-SVM) were established. In the study, five descriptors of the highest importance were identified by RF. To further examine the robustness and stability of the four models, leave-one-out cross validation was adapted to the models and R, Rwere used to measure the performance. The model built by MIX-SVM performed best on the dataset: R<sup>2</sup> on training set and test set were 0.963 and 0.961, mean squared error (MSE) on training set and test set were 0.055 and 0.054, and R, R were 0.918 and 0.889, respectively. Furthermore, five important descriptors were selected for analysis, and the modelling method with the application of MIX-SVM achieved great prediction performance, which could facilitate the design and synthesis of oseltamivir derivatives.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105273"},"PeriodicalIF":3.7,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143761165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Catarina Rocha , Carla Palma , Ricardo J.N. Bettencourt da Silva
{"title":"Sound uncertainty-based strategy for oil spill source identification","authors":"Ana Catarina Rocha , Carla Palma , Ricardo J.N. Bettencourt da Silva","doi":"10.1016/j.chemolab.2024.105275","DOIUrl":"10.1016/j.chemolab.2024.105275","url":null,"abstract":"<div><div>Oil spills are frequent and a major environmental threat, whether they are massive or small. Therefore, authorities and experts have developed analytical chemistry tools to identify spill sources and address these illegal acts by comparing oil patterns obtained by Gas Chromatography-Mass Spectrometry analysis of the spill (Sp) and suspected sources (SS) samples. Several methodologies have proposed different balances between data processing complexity and reliability. Supported by the accessibility and validity of Microsoft Excel spreadsheets, an alternative, accurate, and user-friendly tool was developed for spill source identification based on Monte Carlo Method (MCM) simulation of correlated oil components expressed by abundance ratio (<em>DR</em>). However, the statistical control of various <em>DR</em> and the degree of similarity of samples' compositions, at defined confidence levels, impact the probability of true and false composition equivalence claim of Sp and SS becoming a challenge to recognise the offender. This work not only compares the MCM and the conventional approaches allowing to highlight the limitations that result in evidence with greater uncertainty, but also offers a statistically sound strategy that manages the probabilities of a compositional equivalence claim assessing the ability to distinguish competing spill sources and reporting the most likely polluting source with reduced uncertainty. A decision chart proposed, based on objective and statistically sound criteria, indicates the performance of consecutive <em>DR</em> comparison trials if necessary. The target values established for the probability of compositional equivalence claim of the Sp and the first and second most likely SS (≥95.0 % and ≤0.50 %, respectively) provide to forensic experts’ sound evidence to be presented in court (likelihood ratio ≥190). This work represents a significant breakthrough in comparing complex chemical oil patterns.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"256 ","pages":"Article 105275"},"PeriodicalIF":3.7,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142719851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A flame image soft sensor for oxygen content prediction based on denoising diffusion probabilistic model","authors":"Yi Liu , Angpeng Liu , Shuang Gao","doi":"10.1016/j.chemolab.2024.105269","DOIUrl":"10.1016/j.chemolab.2024.105269","url":null,"abstract":"<div><div>High-precision oxygen content measurement is crucial for statistical analysis of combustion chemical reaction. Deep learning based soft sensor is a new class of intelligent tools for monitoring combustion oxygen content. But in the actual production, data for sensors are often insufficient. A new soft sensing model is proposed to display the excellent performance of denoising diffusion probabilistic model (DDPM) in data generation. Firstly, a UNet based soft sensor is designed by integrating self-attention mechanism into the convolution layers. Then, a denoising loss function is designed to link the feature extraction process of soft sensor model with the reverse denoising process of DDPM, and the noise prediction neural network of DDPM is used to improve the feature extractability of the soft sensor model. Finally, the proposed model is compared with common models. The effectiveness and superiority of the proposed soft sensing model for oxygen content prediction, especially in the case with a small sample size, are both confirmed by the results.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105269"},"PeriodicalIF":3.7,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Yang , Lihao Yao , Fang Jia , Guiyuan Pang , Meiyu Huang , Chengxiang Liu , Hua Luo , Lili Fan
{"title":"Prediction of potential antitumor components in Ganoderma lucidum: A combined approach using machine learning and molecular docking","authors":"Qi Yang , Lihao Yao , Fang Jia , Guiyuan Pang , Meiyu Huang , Chengxiang Liu , Hua Luo , Lili Fan","doi":"10.1016/j.chemolab.2024.105271","DOIUrl":"10.1016/j.chemolab.2024.105271","url":null,"abstract":"<div><div>The objective of this study is to develop a reliable predictive model for antitumour activity and to identify potential antitumour components in <em>Ganoderma lucidum</em>. Four machine learning models, including Random Forest, were employed to train predictive models for antitumour activity, utilising Morgan fingerprints as molecular descriptors. The most effective model was then employed to predict the chemical composition of <em>Ganoderma lucidum</em>, identifying the four most probable compounds for molecular docking with known TNF-α-related targets. The findings of the study indicate that a Support Vector Machine (SVM) model exhibits an accuracy, F1 score, AUC, and sensitivity of 0.7638, 0.7638, 0.8332, and 0.7621, respectively. The model demonstrated an 80 % accuracy rate in predicting the antitumour activity of 10 FDA-approved drugs. Besides, the model identified 11 components in <em>Ganoderma lucidum</em>, including 3-nitroanisole, with a probability of antitumour activity exceeding 0.5, indicating their potential as antitumour agents. The results of the molecular docking procedure indicated that the four most promising antitumour compounds derived from <em>Ganoderma lucidum</em> exhibited a favourable binding affinity with the TNF-α target. In conclusion, this study incorporated a machine learning prediction step prior to molecular docking, thereby enhancing the reliability of the latter. Furthermore, it identified previously unreported compounds in <em>Ganoderma lucidum</em> with potential antitumour activity, such as 3-nitroanisole.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105271"},"PeriodicalIF":3.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spectra data calibration based on deep residual modeling of independent component regression","authors":"Junhua Zheng , Zeyu Yang , Zhiqiang Ge","doi":"10.1016/j.chemolab.2024.105270","DOIUrl":"10.1016/j.chemolab.2024.105270","url":null,"abstract":"<div><div>Independent component regression (ICR) has recently become quite popular in spectra data calibration, due to its advantages in non-Gaussian data modeling and high-order statistics feature extraction. Inspired by the idea of deep learning, this paper extends the basic ICR model to the deep form by introducing a layer-wise residual learning strategy. Based on the residual information generated from last layer of the deep learning model, more and more different patterns of independent components can be extracted layer-by-layer. Then, a further information compression step is taken to combine and also to condense those independent components obtained from different layers of the deep model. Two detailed benchmark case studies are implemented to evaluate the calibration performance of the develop model, based on which the effectiveness of both layer-by-layer component extraction and further information compression are well confirmed.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105270"},"PeriodicalIF":3.7,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chairul Ichsan , Navinda Ramadhan , Komang Gede Yudi Arsana , M. Mahfudz Fauzi Syamsuri , Rohmatullaili
{"title":"Enhanced CO2 leak detection in soil: High-fidelity digital colorimetry with machine learning and ACES AP0","authors":"Chairul Ichsan , Navinda Ramadhan , Komang Gede Yudi Arsana , M. Mahfudz Fauzi Syamsuri , Rohmatullaili","doi":"10.1016/j.chemolab.2024.105268","DOIUrl":"10.1016/j.chemolab.2024.105268","url":null,"abstract":"<div><div>The importance of effective carbon capture and storage (CCS) in addressing climate change issues highlights the need for robust CO<sub>2</sub> leak monitoring systems. Limitations of conventional methods have prompted interest in alternative approaches, such as optical CO<sub>2</sub> sensors, which offer non-invasive and continuous monitoring. Here, we present a novel methodology for high-fidelity digital colorimetry to enhance CO<sub>2</sub> leak detection in soil, integrating machine learning algorithms with the ACES AP0 color space. Optical CO<sub>2</sub> sensors, utilizing a cresol red-based detection solution, were calibrated and validated in a controlled environment chamber designed to simulate CO<sub>2</sub> leakage. Digital images of the sensor's colorimetric response to varying CO<sub>2</sub> levels were analyzed in five color spaces. The ACES AP0 color space, renowned for its expansive color gamut and perceptual uniformity, exhibited optimal performance in discerning subtle color variations induced by changes in CO<sub>2</sub> concentration. Ten machine learning regression models were evaluated, and Multivariate Polynomial Regression (MPR) emerged as the most effective in converting ACES AP0 color data into precise CO<sub>2</sub> concentration estimates, achieving a Mean Absolute Percentage Error (MAPE) of 2.9 % and a Root Mean Square Error (RMSE) of 0.0731. Field validation at a carbon capture and storage (CCS) facility corroborated the robustness and accuracy of this method, showcasing its potential for real-world applications in CCS and environmental monitoring.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105268"},"PeriodicalIF":3.7,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rong Fan , Abdul Rauf , Manal Elzain Mohamed Abdalla , Arif Nazir , Muhammad Faisal , Adnan Aslam
{"title":"Quantitative structure properties relationship (QSPR) analysis for physicochemical properties of nonsteroidal anti-inflammatory drugs (NSAIDs) usingVe degree-based reducible topological indices","authors":"Rong Fan , Abdul Rauf , Manal Elzain Mohamed Abdalla , Arif Nazir , Muhammad Faisal , Adnan Aslam","doi":"10.1016/j.chemolab.2024.105266","DOIUrl":"10.1016/j.chemolab.2024.105266","url":null,"abstract":"<div><div>Nonsteroidal Anti-Inflammatory Drugs (NSAIDs) are a class of medications that are used for different therapeutic uses. They effectively alleviate pain, reduce inflammation, and manage fever. These drugs are available in various forms. NSAIDs are prescribed by healthcare professionals to address a wide range of symptoms, from headaches and dental pain to conditions like arthritis and muscle stiffness. In this work, we use ve-degree-based reducible topological descriptors in quantitative structure-property relationship (QSPR) analysis to estimate the physicochemical properties of NSAIDs. In the first step, we have developed a MAPLE-based code to compute the reducible ve-degree-based topological descriptors of NSAIDs. Then, a linear regression model was used to estimate four physicochemical properties of seventy NSAIDs. It has been observed that two physicochemical properties, namely Molecular Weight and Complexity show a very strong correlation with the reducible ve-degree-based topological descriptors. For both cases, the value of correlation coefficient is greater than 0.9. Finally, quadratic and cubic regression models were constructed, and a comparative analysis with these models is presented. These results may help enhance the understanding of NSAIDs medication structures and aid in predicting their pharmacological activity.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105266"},"PeriodicalIF":3.7,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hugues Kouakou , José Henrique de Morais Goulart , Raffaele Vitale , Thomas Oberlin , David Rousseau , Cyril Ruckebusch , Nicolas Dobigeon
{"title":"On-the-fly spectral unmixing based on Kalman filtering","authors":"Hugues Kouakou , José Henrique de Morais Goulart , Raffaele Vitale , Thomas Oberlin , David Rousseau , Cyril Ruckebusch , Nicolas Dobigeon","doi":"10.1016/j.chemolab.2024.105252","DOIUrl":"10.1016/j.chemolab.2024.105252","url":null,"abstract":"<div><div>This work introduces an on-the-fly (i.e., online) linear spectral unmixing method which is able to sequentially analyze spectral data acquired on a spectrum-by-spectrum basis. After deriving a sequential counterpart of the conventional linear mixing model, the proposed approach recasts the linear unmixing problem into a linear state-space estimation framework. Under Gaussian noise and state models, the estimation of the pure spectra can be efficiently conducted by resorting to Kalman filtering. Interestingly, it is shown that this Kalman filter can operate in a lower-dimensional subspace to lighten the computational burden of the overall unmixing procedure. Experimental results obtained on synthetic and real Raman data sets show that this Kalman filter-based method offers a convenient trade-off between unmixing accuracy and computational efficiency, which is crucial for operating in an on-the-fly setting. The proposed method constitutes a valuable building block for benefiting from acquisition and processing frameworks recently proposed in the microscopy literature, which are motivated by practical issues such as reducing acquisition time and avoiding potential damages being inflicted to photosensitive samples. The code associated with the numerical illustrations reported in this paper is freely available online at <span><span>https://github.com/HKouakou/KF-OSU</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105252"},"PeriodicalIF":3.7,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}