Andrea Cristina Novack , Alexandre de Fátima Cobre , Dile Pontarolo Stremel , Luana Mota Ferreira , Michel Leandro Campos , Roberto Pontarolo
{"title":"Development and validation of a new method by MIR-FTIR and chemometrics for the early diagnosis of leprosy and evaluation of the treatment effect","authors":"Andrea Cristina Novack , Alexandre de Fátima Cobre , Dile Pontarolo Stremel , Luana Mota Ferreira , Michel Leandro Campos , Roberto Pontarolo","doi":"10.1016/j.chemolab.2024.105248","DOIUrl":"10.1016/j.chemolab.2024.105248","url":null,"abstract":"<div><h3>Objective</h3><div>Develop a new method for diagnosing leprosy and monitoring the pharmacological treatment effect of patients.</div></div><div><h3>Material and methods</h3><div>Plasma samples from patients diagnosed with leprosy (n = 211) who had not yet received any pharmacological treatment were collected at a basic health unit in Brazil. After treatment, samples were collected from the same patients (n = 125). Plasma samples from healthy volunteers were also collected (n = 179) and used as a control group. All samples were analyzed by Fourier transform mid-infrared spectrophotometry (MIR-FTIR). The spectral data of the samples were subjected to chemometric analysis. Principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) were used to predict diagnosis and monitor pharmacological treatment.</div></div><div><h3>Results</h3><div>The PCA model successfully distinguished among three sample classes: healthy individuals, pre-treatment leprosy patients, and post-treatment leprosy patients. The PLS-DA algorithm accurately classified healthy, treated, and diseased samples, facilitating both reliable diagnosis and treatment monitoring for leprosy. The model achieved a sensitivity of 97 %–100 %, specificity of 100 %, and accuracy ranging from 99 % to 100 %. Furthermore, when tested on plasma samples from patients with other conditions—renal failure (n = 1032), hypertriglyceridemia (n = 100), hypercholesterolemia (n = 100), and mixed dyslipidemia (n = 100)—the model correctly classified these as negative for leprosy, with diagnostic specificity between 93 % and 96 %.</div></div><div><h3>Conclusion</h3><div>The MIR-FTIR technique combined with PLS-DA analysis proved to be a highly effective tool for screening leprosy patients and monitoring treatment outcomes. Given its low cost, this method could be easily implemented in laboratories across emerging and low-income countries.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105248"},"PeriodicalIF":3.7,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142526138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LTFM: Long-tail few-shot module with loose coupling strategy for mineral spectral identification","authors":"Youpeng Fan , Yongchun Fang","doi":"10.1016/j.chemolab.2024.105247","DOIUrl":"10.1016/j.chemolab.2024.105247","url":null,"abstract":"<div><div>In recent years, deep learning methods have exhibited superior performance in mineral identification when especially compared with conventional machine learning methods such as Support Vector Machine (SVM) and Partial Least Squares (PLS). Nevertheless, almost all of these deep learning methods pay more attention to improving and designing network structures, while neglecting the phenomenon of long-tail distribution in spectral data due to the inconsistency of ore distribution and the scarcity of several natural minerals. To alleviate the interference of majority categories on minority categories, we propose <strong>L</strong>ong-<strong>T</strong>ail <strong>F</strong>ew-shot <strong>M</strong>odule (LTFM) which is inspired by rethinking the fashionable decoupling strategy that conducts primary representation learning and further classifier retrained on mineral spectral data. In particular, LTFM serves as a multi-expert mode, where these experts are respectively specialized in improving feature representation learning, mitigating the long-tail effect, and alleviating the interference of few shots. Additionally, the loose coupling learning strategy is introduced to facilitate primary representation learning and the subsequent additional experts to inherit this knowledge. Experiments on two publicly available spectral datasets show that the proposed LTFM significantly outperforms existing methods. In the end, extensive ablation studies are conducted to investigate the effectiveness, correctness, and robustness of our proposal.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105247"},"PeriodicalIF":3.7,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142441358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haonan Zhang, Xiaojing Ping, Haiying Wan, Xiaoli Luan, Fei Liu
{"title":"Algae content prediction based on transfer learning and mean impact value","authors":"Haonan Zhang, Xiaojing Ping, Haiying Wan, Xiaoli Luan, Fei Liu","doi":"10.1016/j.chemolab.2024.105244","DOIUrl":"10.1016/j.chemolab.2024.105244","url":null,"abstract":"<div><div>To improve the prediction accuracies of algae contents in different water bodies, this paper proposes a chlorophyll-A prediction model method based on transfer learning(TL) and mean impact value(MIV) algorithm. Firstly, we preprocess the data collected from the Huai River, including removing the missing data and standardizing the preserved data. Then, the MIV algorithm is used to reduce the dimensionality of the data and determine the input variables of the model. Based on the selected input variables, the TL algorithm is introduced to establish the chlorophyll-A prediction model. The developed method can effectively enhance the prediction accuracy, especially when the number of samples is small. The simulation results verify the effectiveness and feasibility of the proposed prediction model.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105244"},"PeriodicalIF":3.7,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142526139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Doan Thanh Xuan , Hue Minh Thi Nguyen , Vu Dang Hoang
{"title":"Recent applications of analytical quality-by-design methodology for chromatographic analysis: A review","authors":"Doan Thanh Xuan , Hue Minh Thi Nguyen , Vu Dang Hoang","doi":"10.1016/j.chemolab.2024.105243","DOIUrl":"10.1016/j.chemolab.2024.105243","url":null,"abstract":"<div><div>Analytical Quality-by-Design (AQbD) represents a systematic methodology for method development. The pharmaceutical and biopharmaceutical industries have increasingly recognized and applied AQbD concepts, guided by the overall framework provided by ICH. AQbD is established to ensure that an analytical procedure is fit for its intended purpose throughout its entire lifecycle, leading to a well-understood and purpose-driven method. It guides each stage of the analytical process lifecycle by establishing the Analytical Target Profile (ATP), identifying critical method parameters (CMPs), and selecting critical method attributes (CMAs). By employing screening and response-surface experimental designs, significant factors are pinpointed and optimized through statistical analysis. This methodology aids in defining the design space or Method Operable Design Region (MODR) to ensure consistent method performance. This review delves into the foundational principles of AQbD for method development and presents its latest applications in the period 2019–2024 with reference to chromatographic analysis of both non-synthetic and synthetic compounds in different sample matrices. The implementation of AQbD proved to generate more robust chromatographic methods, enhancing their efficiency in the process. Nevertheless, its adoption can be hindered owing to the necessity for a comprehensive grasp of statistical analysis and experimental design, coupled with the absence of standardized directives or regulatory prerequisites.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105243"},"PeriodicalIF":3.7,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142441356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Layer-wise-residual-driven approach for soft sensing in composite dynamic system based on slow and fast time-varying latent variables","authors":"Zhengxuan Zhang , Xu Yang , Jian Huang , Yuri A.W. Shardt","doi":"10.1016/j.chemolab.2024.105245","DOIUrl":"10.1016/j.chemolab.2024.105245","url":null,"abstract":"<div><div>Driven by the requirements for a comprehensive understanding of composite dynamic systems in industrial processes, this paper investigates a new soft sensor for quality prediction based on slow and fast time-varying latent variables extraction using layer-wise residuals. First, the slow feature partial least squares were expanded into long-term dependency by introducing explicit expressions of the potential state of the process into the objective function. Then, the multilayer regression model for exploring composite dynamics driven by layer-wise residuals is developed using a serial structure that can extract both slow and fast time-varying latent variables that are completely orthogonal. Finally, the exponential-weighted partial least squares are proposed for extracting fast time-varying dynamic latent variables by learning the exponential decay properties of the time-series data correlation. Case studies on the industrial debutanizer and sulfur recovery unit show that the prediction accuracy of the proposed approach outperforms traditional methods.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105245"},"PeriodicalIF":3.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142446163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Suliany Rodríguez-Barrios , Joan Ferré , M. Soledad Larrechi , Enric Ruiz
{"title":"Applicability domain of a calibration model based on neural networks and infrared spectroscopy","authors":"M. Suliany Rodríguez-Barrios , Joan Ferré , M. Soledad Larrechi , Enric Ruiz","doi":"10.1016/j.chemolab.2024.105242","DOIUrl":"10.1016/j.chemolab.2024.105242","url":null,"abstract":"<div><div>Artificial neural networks are used as calibration models in routine analytical determinations that involve spectroscopic data. To ensure that the model will generate reliable predictions for new samples, the applicability domain must be well defined. This article describes a strategy for establishing the limits of the applicability domain when the calibration model is a feed-forward neural network. The applicability domain was defined by two limits: 1) the 0.99 quantile of the squared Mahalanobis distance calculated from the network activations of the training set and 2) the 0.99 quantile of the reconstruction error of the training spectra using either an autoencoder network or a decoder network. A new sample with a squared Mahalanobis distance and/or spectral residuals beyond these limits is said to be outside the applicability domain, and the prediction is questionable. The approach was illustrated by predicting the density of diesel fuel samples from mid-infrared spectra and the fat content in meat from near-infrared spectra. The methodology could correctly detect anomalous spectra in prediction using either the autoencoder or the decoder.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105242"},"PeriodicalIF":3.7,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142441357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yaoyang Liu , Morug Salih Mahdi , Usama Kadem Radi , Ali Jihad , Ali Hamid AbdulHussein , Irshad Ahmad , Nasrin Mansuri , Mostafa Adnan Abdalrahman , Ahmed Alkhayyat , Ahmed Faisal
{"title":"Machine learning based modeling for estimation of drug solubility in supercritical fluid by adjusting important parameters","authors":"Yaoyang Liu , Morug Salih Mahdi , Usama Kadem Radi , Ali Jihad , Ali Hamid AbdulHussein , Irshad Ahmad , Nasrin Mansuri , Mostafa Adnan Abdalrahman , Ahmed Alkhayyat , Ahmed Faisal","doi":"10.1016/j.chemolab.2024.105241","DOIUrl":"10.1016/j.chemolab.2024.105241","url":null,"abstract":"<div><div>Here, we employed machine learning models to predict how well Capecitabine drug would dissolve in supercritical carbon dioxide as the green solvent. The vision is to investigate the drug suitability for processing of nanodrugs with enhanced bioavailability in the body. In the employed data set, P (pressure) and T (temperature) serve as inputs, and Y, the solubility, is the only output for building the models. This study uses DT (Decision Tree) and MLP (Multilayer perceptron) as the core models. However, the raw and individual form of conventional algorithms may not provide accurate and general results. Ensemble methods like boosting improve the model performance. Also, single and ensemble models mounted on these models have hyper-parameters whose optimization affects the final models. Meta-heuristic algorithms are popular for tuning hyper-parameters. In this research, we used a new hybrid framework by coupling the basic models with the Adaboost algorithm (as an ensemble method) and PO and CS algorithms (as optimizers) to obtain four different models. The MLP model boosted with Adaboost and tuned with PO algorithm showed the best fitting accuracy among all models. This model reduces the RMSE error rate to 1.71, MSE to 2.92, and MAE to 1.42.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105241"},"PeriodicalIF":3.7,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Benchmarking multiblock methods with canonical factorization","authors":"Stéphanie Bougeard , Caroline Peltier , Benoit Jaillais , Jean-Claude Boulet , Mohamed Hanafi","doi":"10.1016/j.chemolab.2024.105240","DOIUrl":"10.1016/j.chemolab.2024.105240","url":null,"abstract":"<div><div>Data measured on the same observations and organized in blocks of variables — from different measurement sources or deduced from topics specified by the user — are common in practice. Multiblock exploratory methods are useful tools to extract information from data in a reduced and interpretable common space. However, many methods have been proposed independently and the users are often lost in selecting the appropriate one, especially as they do not always lead to the same results or because outputs do not have the same form. For this purpose, the data decomposition by canonical factorization was introduced thus applied to some widely-used methods, CPCA, MCOA, MFA, STATIS and CCSWA. The methods were compared on simulated (resp. real) data whose structure is controlled (resp. known). Theoretical and practical results pinpoint that the block-structure must be carefully explored beforehand. The number of block-variables and the block-variance distribution along dimensions impacts the choice of the block-scaling. The observation-structure within and between blocks impacts the choice of the method. CPCA or MCOA mix common and specific information, STATIS highlights common structure only whereas CCSWA focuses on specific information. To enable these diagnoses, methods and proposed comparison tools are available on <span>R</span>, <span>Matlab</span> or <span>Galaxy</span>.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105240"},"PeriodicalIF":3.7,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zina-Sabrina Duma , Jouni Susiluoto , Otto Lamminpää , Tuomas Sihvonen , Satu-Pia Reinikainen , Heikki Haario
{"title":"KF-PLS: Optimizing Kernel Partial Least-Squares (K-PLS) with Kernel Flows","authors":"Zina-Sabrina Duma , Jouni Susiluoto , Otto Lamminpää , Tuomas Sihvonen , Satu-Pia Reinikainen , Heikki Haario","doi":"10.1016/j.chemolab.2024.105238","DOIUrl":"10.1016/j.chemolab.2024.105238","url":null,"abstract":"<div><div>Partial Least-Squares (PLS) regression is a widely used tool in chemometrics for performing multivariate regression. As PLS has a limited capacity of modelling non-linear relations between the predictor variables and the response, Kernel PLS (K-PLS) has been introduced for modelling non-linear predictor-response relations. Most available studies use fixed kernel parameters, reducing the performance potential of the method. Only a few studies have been conducted on optimizing the kernel parameters for K-PLS. In this article, we propose a methodology for the kernel function optimization based on Kernel Flows (KF), a technique developed for Gaussian Process Regression (GPR). The results are illustrated with four case studies. The case studies represent both numerical examples and real data used in classification and regression tasks. K-PLS optimized with KF, called KF-PLS in this study, is shown to yield good results in all illustrated scenarios, outperforming literature results and other non-linear regression methodologies. In the present study, KF-PLS has been compared to convolutional neural networks (CNN), random trees, ensemble methods, support vector machines (SVM), and GPR, and it has proved to perform very well.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105238"},"PeriodicalIF":3.7,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Raza , Jamal Uddin , Quan Zou , Shahid Akbar , Wajdi Alghamdi , Ruijun Liu
{"title":"AIPs-DeepEnC-GA: Predicting anti-inflammatory peptides using embedded evolutionary and sequential feature integration with genetic algorithm based deep ensemble model","authors":"Ali Raza , Jamal Uddin , Quan Zou , Shahid Akbar , Wajdi Alghamdi , Ruijun Liu","doi":"10.1016/j.chemolab.2024.105239","DOIUrl":"10.1016/j.chemolab.2024.105239","url":null,"abstract":"<div><div>Inflammation is a biological response to harmful stimuli including infections, damaged cells, tissue injuries, and toxic chemicals. It plays an essential role in facilitating tissue repair by eliminating pathogenic microorganisms. Currently, numerous therapies are applied to treat autoimmune and inflammatory diseases. However, these conventional anti-inflammatory medications are often labor-intensive, costly, and associated with adverse side effects. Recently, researchers have identified anti-inflammatory peptides (AIPs) as a cost-effective alternative for treating several inflammatory diseases, due to their high selectivity for target cells with minimal side effects. In this study, we introduce a novel computational predictor, AIPs-DeepEnC-GA, developed to accurately predict AIP samples. The training sequences are encoded using a novel n-spaced dipeptide-based position-specific scoring matrix (NsDP-PSSM) and Pseudo position-specific scoring matrix (PsePSSM)-based embedded evolutionary features. Additionally, the reduced-amino acid alphabet (RAAA-11), and composite Physiochemical properties (CPP) are employed to capture cluster-physiochemical properties based on structural information. A hybrid feature strategy is then applied, integrating embedded evolutionary features, CPP and RAAA-11 descriptors to overcome the limitations of individual encoding methods. Minimum redundancy and maximum relevance (mRMR) is utilized to select the optimal features. The selected features are trained using four different deep-learning models. The predictive labels generated by these models are provided to a genetic algorithm to form a deep-ensemble training model. The proposed AIPs-DeepEnC-GA model achieved a ∼15 % increase in predictive accuracy, reaching 94.39 %, and a 19 % improvement in the area under the curve (AUC), achieving a value of 0.98 using training sequences. For independent datasets, our method obtained improved accuracies of 91.87 %, and 89.21 %, with AUC values of 0.94 and 0.92 for Ind-I, and Ind-II, respectively. Our proposed AIPs-DeepEnC-GA model demonstrates an 11 % improvement in predictive accuracy over existing AIPs computational models using training samples. The efficacy and reliability of this model make it a promising tool for both in drug development and research academia.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105239"},"PeriodicalIF":3.7,"publicationDate":"2024-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}