Lei Zhang , Guofeng Ren , Shanlian Li , Jinsong Du , Dayong Xu , Yinhua Li
{"title":"A novel soft sensor approach for industrial quality prediction based TCN with spatial and temporal attention","authors":"Lei Zhang , Guofeng Ren , Shanlian Li , Jinsong Du , Dayong Xu , Yinhua Li","doi":"10.1016/j.chemolab.2024.105272","DOIUrl":"10.1016/j.chemolab.2024.105272","url":null,"abstract":"<div><div>The complex industrial process is often characterized by strong multivariate coupling and nonlinear dynamic changes, which pose great challenges to modeling and prediction. Traditional deep learning methods are difficult to effectively capture spatiotemporal characteristics of industrial processes, resulting in poor prediction accuracy. To tackle this issue, we propose a novel end-to-end method named STA-TCN, which utilizes a temporal convolutional network (TCN) with both spatial and temporal attention mechanisms. The TCN uses causal and dilated convolutions to capture long temporal patterns in time series data. The spatial attention identifies the significance of different features, while the temporal attention focuses on crucial time steps. This design assigns adaptive weights to different features and emphasizes key moments to improve the accuracy of dynamic processes. We conduct experiments on two industrial datasets and show that the proposed STA-TCN method achieves significantly improved predictive performance compared to TCN for quality prediction of industrial processes. The results validate the effectiveness and robustness of the proposed method.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105272"},"PeriodicalIF":3.7,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143156194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agung Surya Wibowo , Osphanie Mentari Primadianti , Hilal Tayara , Kil To Chong
{"title":"GATNM: Graph with Attention Neural Network Model for Mycobacterial Cell Wall Permeability of Drugs and Drug-like Compounds","authors":"Agung Surya Wibowo , Osphanie Mentari Primadianti , Hilal Tayara , Kil To Chong","doi":"10.1016/j.chemolab.2024.105265","DOIUrl":"10.1016/j.chemolab.2024.105265","url":null,"abstract":"<div><div><em>Mycobacterium tuberculosis</em> cell wall has complexity and unusual organization. These conditions make the nutrients and antibiotics difficult to penetrate this wall which affects the low activity of several antimycobacterial drugs in mycobacteria cells. Based on this information, the cell wall permeability prediction in some compounds becomes important and would help develop novel antitubercular drugs. Recently, there have been many predictions helped by computational technology using the Simplified Molecular Input Line Entry System (SMILES) input drug compounds. In this study, we applied computational technology to predict the permeability of cell walls to some compounds or drugs. We evaluated several common machine learning models for their ability to predict cell wall permeability. However, none of these models achieved satisfactory performance. We investigated a Graph with Attention Neural Network (GATNN) model to address this challenge. In the case of permeability detection, to the best of our knowledge, the GATNN model is considered a new approach to improve the prediction performance of the penetration ability of some compounds to the cell wall of the mycobacterial. Additionally, we optimized the accuracy value to get the best hyperparameter and the best model by Optuna. After getting the optimal model, by using the benchmark dataset, this model has slightly increased the performance over the previous model in accuracy and specificity to 78.9% and 81.5%. As a complementary, we also provided an ensemble model and generated the interpretability of the model. The code and materials of all experiments in this paper can be accessed freely at this link: <span><span>https://github.com/asw1982/MTbPrediction</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"256 ","pages":"Article 105265"},"PeriodicalIF":3.7,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142719739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianmin Li , Tian Zhao , Qin Yang , Shijie Du , Lu Xu
{"title":"A review of quantitative structure-activity relationship: The development and current status of data sets, molecular descriptors and mathematical models","authors":"Jianmin Li , Tian Zhao , Qin Yang , Shijie Du , Lu Xu","doi":"10.1016/j.chemolab.2024.105278","DOIUrl":"10.1016/j.chemolab.2024.105278","url":null,"abstract":"<div><div>Developing Quantitative Structure-Activity Relationship (QSAR) models applicable to general molecules is of great significance for molecular design in many disciplines. This paper reviews the development and current status of molecular QSAR research, including datasets, molecular descriptors, and mathematical models. A representative bibliometric analysis reveals the evolutionary trends in this field in the past decade. Based on the discussion of the advantages and shortcomings of existing methods, the requirements and possible approaches for developing a widely applicable QSAR model were put forward. This goal poses a series of challenges to QSAR, including: (1) Having a sufficient number of structure-activity relationship instances as training data to cope with the complexity and diversity of molecular structures and action mechanisms; (2) Developing and using precise molecular descriptors to avoid the situation of ‘garbage in, garbage out’, while balancing descriptor dimensions and computational costs; and (3) Using powerful and flexible mathematical models, such as deep learning models, to learn complex functional relationships between descriptors and activity. With the emergence of larger and higher-quality data sets, more accurate molecular descriptors and deep learning methods, predictive ability, interpretability and application domain of QSAR models will continue to improve, and it will play a more important role in various fields of molecular design.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"256 ","pages":"Article 105278"},"PeriodicalIF":3.7,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142719740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VAE-SIMCA — Data-driven method for building one class classifiers with variational autoencoders","authors":"Akam Petersen, Sergey Kucheryavskiy","doi":"10.1016/j.chemolab.2024.105276","DOIUrl":"10.1016/j.chemolab.2024.105276","url":null,"abstract":"<div><div>The paper proposes a new method for building one class classifiers based on variational autoencoders (VAE). The classification decision is built on a linear combination of two squared distances: computed for the original and the reconstructed image as well as for the representation of the original image inside the latent space formed by VAE. Because both distances are well approximated by scaled chi-square distribution, the decision boundary is computed using the theoretical quantile function for this distribution and the predefined probability for Type I error, ⍺. Thereby the boundary does not require any specific optimization and is solely based on the model outcomes computed for the training set.</div><div>The original idea of the proposed method is inherited from another OCC approach, Data Driven Soft Independent Method for Class Analogies, where singular value decomposition is employed for building the latent space. In this paper we show how this idea can be adopted to be used with VAE for detection of anomalies on images. The paper describes the theoretical background, introduces the main outcomes as well as tools for visual exploration of the classification results, and shows how the method works on several simulated and real datasets.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"256 ","pages":"Article 105276"},"PeriodicalIF":3.7,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142719850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ana Catarina Rocha , Carla Palma , Ricardo J.N. Bettencourt da Silva
{"title":"Sound uncertainty-based strategy for oil spill source identification","authors":"Ana Catarina Rocha , Carla Palma , Ricardo J.N. Bettencourt da Silva","doi":"10.1016/j.chemolab.2024.105275","DOIUrl":"10.1016/j.chemolab.2024.105275","url":null,"abstract":"<div><div>Oil spills are frequent and a major environmental threat, whether they are massive or small. Therefore, authorities and experts have developed analytical chemistry tools to identify spill sources and address these illegal acts by comparing oil patterns obtained by Gas Chromatography-Mass Spectrometry analysis of the spill (Sp) and suspected sources (SS) samples. Several methodologies have proposed different balances between data processing complexity and reliability. Supported by the accessibility and validity of Microsoft Excel spreadsheets, an alternative, accurate, and user-friendly tool was developed for spill source identification based on Monte Carlo Method (MCM) simulation of correlated oil components expressed by abundance ratio (<em>DR</em>). However, the statistical control of various <em>DR</em> and the degree of similarity of samples' compositions, at defined confidence levels, impact the probability of true and false composition equivalence claim of Sp and SS becoming a challenge to recognise the offender. This work not only compares the MCM and the conventional approaches allowing to highlight the limitations that result in evidence with greater uncertainty, but also offers a statistically sound strategy that manages the probabilities of a compositional equivalence claim assessing the ability to distinguish competing spill sources and reporting the most likely polluting source with reduced uncertainty. A decision chart proposed, based on objective and statistically sound criteria, indicates the performance of consecutive <em>DR</em> comparison trials if necessary. The target values established for the probability of compositional equivalence claim of the Sp and the first and second most likely SS (≥95.0 % and ≤0.50 %, respectively) provide to forensic experts’ sound evidence to be presented in court (likelihood ratio ≥190). This work represents a significant breakthrough in comparing complex chemical oil patterns.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"256 ","pages":"Article 105275"},"PeriodicalIF":3.7,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142719851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A flame image soft sensor for oxygen content prediction based on denoising diffusion probabilistic model","authors":"Yi Liu , Angpeng Liu , Shuang Gao","doi":"10.1016/j.chemolab.2024.105269","DOIUrl":"10.1016/j.chemolab.2024.105269","url":null,"abstract":"<div><div>High-precision oxygen content measurement is crucial for statistical analysis of combustion chemical reaction. Deep learning based soft sensor is a new class of intelligent tools for monitoring combustion oxygen content. But in the actual production, data for sensors are often insufficient. A new soft sensing model is proposed to display the excellent performance of denoising diffusion probabilistic model (DDPM) in data generation. Firstly, a UNet based soft sensor is designed by integrating self-attention mechanism into the convolution layers. Then, a denoising loss function is designed to link the feature extraction process of soft sensor model with the reverse denoising process of DDPM, and the noise prediction neural network of DDPM is used to improve the feature extractability of the soft sensor model. Finally, the proposed model is compared with common models. The effectiveness and superiority of the proposed soft sensing model for oxygen content prediction, especially in the case with a small sample size, are both confirmed by the results.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105269"},"PeriodicalIF":3.7,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qi Yang , Lihao Yao , Fang Jia , Guiyuan Pang , Meiyu Huang , Chengxiang Liu , Hua Luo , Lili Fan
{"title":"Prediction of potential antitumor components in Ganoderma lucidum: A combined approach using machine learning and molecular docking","authors":"Qi Yang , Lihao Yao , Fang Jia , Guiyuan Pang , Meiyu Huang , Chengxiang Liu , Hua Luo , Lili Fan","doi":"10.1016/j.chemolab.2024.105271","DOIUrl":"10.1016/j.chemolab.2024.105271","url":null,"abstract":"<div><div>The objective of this study is to develop a reliable predictive model for antitumour activity and to identify potential antitumour components in <em>Ganoderma lucidum</em>. Four machine learning models, including Random Forest, were employed to train predictive models for antitumour activity, utilising Morgan fingerprints as molecular descriptors. The most effective model was then employed to predict the chemical composition of <em>Ganoderma lucidum</em>, identifying the four most probable compounds for molecular docking with known TNF-α-related targets. The findings of the study indicate that a Support Vector Machine (SVM) model exhibits an accuracy, F1 score, AUC, and sensitivity of 0.7638, 0.7638, 0.8332, and 0.7621, respectively. The model demonstrated an 80 % accuracy rate in predicting the antitumour activity of 10 FDA-approved drugs. Besides, the model identified 11 components in <em>Ganoderma lucidum</em>, including 3-nitroanisole, with a probability of antitumour activity exceeding 0.5, indicating their potential as antitumour agents. The results of the molecular docking procedure indicated that the four most promising antitumour compounds derived from <em>Ganoderma lucidum</em> exhibited a favourable binding affinity with the TNF-α target. In conclusion, this study incorporated a machine learning prediction step prior to molecular docking, thereby enhancing the reliability of the latter. Furthermore, it identified previously unreported compounds in <em>Ganoderma lucidum</em> with potential antitumour activity, such as 3-nitroanisole.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105271"},"PeriodicalIF":3.7,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spectra data calibration based on deep residual modeling of independent component regression","authors":"Junhua Zheng , Zeyu Yang , Zhiqiang Ge","doi":"10.1016/j.chemolab.2024.105270","DOIUrl":"10.1016/j.chemolab.2024.105270","url":null,"abstract":"<div><div>Independent component regression (ICR) has recently become quite popular in spectra data calibration, due to its advantages in non-Gaussian data modeling and high-order statistics feature extraction. Inspired by the idea of deep learning, this paper extends the basic ICR model to the deep form by introducing a layer-wise residual learning strategy. Based on the residual information generated from last layer of the deep learning model, more and more different patterns of independent components can be extracted layer-by-layer. Then, a further information compression step is taken to combine and also to condense those independent components obtained from different layers of the deep model. Two detailed benchmark case studies are implemented to evaluate the calibration performance of the develop model, based on which the effectiveness of both layer-by-layer component extraction and further information compression are well confirmed.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105270"},"PeriodicalIF":3.7,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142652891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chairul Ichsan , Navinda Ramadhan , Komang Gede Yudi Arsana , M. Mahfudz Fauzi Syamsuri , Rohmatullaili
{"title":"Enhanced CO2 leak detection in soil: High-fidelity digital colorimetry with machine learning and ACES AP0","authors":"Chairul Ichsan , Navinda Ramadhan , Komang Gede Yudi Arsana , M. Mahfudz Fauzi Syamsuri , Rohmatullaili","doi":"10.1016/j.chemolab.2024.105268","DOIUrl":"10.1016/j.chemolab.2024.105268","url":null,"abstract":"<div><div>The importance of effective carbon capture and storage (CCS) in addressing climate change issues highlights the need for robust CO<sub>2</sub> leak monitoring systems. Limitations of conventional methods have prompted interest in alternative approaches, such as optical CO<sub>2</sub> sensors, which offer non-invasive and continuous monitoring. Here, we present a novel methodology for high-fidelity digital colorimetry to enhance CO<sub>2</sub> leak detection in soil, integrating machine learning algorithms with the ACES AP0 color space. Optical CO<sub>2</sub> sensors, utilizing a cresol red-based detection solution, were calibrated and validated in a controlled environment chamber designed to simulate CO<sub>2</sub> leakage. Digital images of the sensor's colorimetric response to varying CO<sub>2</sub> levels were analyzed in five color spaces. The ACES AP0 color space, renowned for its expansive color gamut and perceptual uniformity, exhibited optimal performance in discerning subtle color variations induced by changes in CO<sub>2</sub> concentration. Ten machine learning regression models were evaluated, and Multivariate Polynomial Regression (MPR) emerged as the most effective in converting ACES AP0 color data into precise CO<sub>2</sub> concentration estimates, achieving a Mean Absolute Percentage Error (MAPE) of 2.9 % and a Root Mean Square Error (RMSE) of 0.0731. Field validation at a carbon capture and storage (CCS) facility corroborated the robustness and accuracy of this method, showcasing its potential for real-world applications in CCS and environmental monitoring.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105268"},"PeriodicalIF":3.7,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rong Fan , Abdul Rauf , Manal Elzain Mohamed Abdalla , Arif Nazir , Muhammad Faisal , Adnan Aslam
{"title":"Quantitative structure properties relationship (QSPR) analysis for physicochemical properties of nonsteroidal anti-inflammatory drugs (NSAIDs) usingVe degree-based reducible topological indices","authors":"Rong Fan , Abdul Rauf , Manal Elzain Mohamed Abdalla , Arif Nazir , Muhammad Faisal , Adnan Aslam","doi":"10.1016/j.chemolab.2024.105266","DOIUrl":"10.1016/j.chemolab.2024.105266","url":null,"abstract":"<div><div>Nonsteroidal Anti-Inflammatory Drugs (NSAIDs) are a class of medications that are used for different therapeutic uses. They effectively alleviate pain, reduce inflammation, and manage fever. These drugs are available in various forms. NSAIDs are prescribed by healthcare professionals to address a wide range of symptoms, from headaches and dental pain to conditions like arthritis and muscle stiffness. In this work, we use ve-degree-based reducible topological descriptors in quantitative structure-property relationship (QSPR) analysis to estimate the physicochemical properties of NSAIDs. In the first step, we have developed a MAPLE-based code to compute the reducible ve-degree-based topological descriptors of NSAIDs. Then, a linear regression model was used to estimate four physicochemical properties of seventy NSAIDs. It has been observed that two physicochemical properties, namely Molecular Weight and Complexity show a very strong correlation with the reducible ve-degree-based topological descriptors. For both cases, the value of correlation coefficient is greater than 0.9. Finally, quadratic and cubic regression models were constructed, and a comparative analysis with these models is presented. These results may help enhance the understanding of NSAIDs medication structures and aid in predicting their pharmacological activity.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"255 ","pages":"Article 105266"},"PeriodicalIF":3.7,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}