Chemometrics and Intelligent Laboratory Systems最新文献

筛选
英文 中文
Assessing robust prediction models without test datasets: A causal discovery approach on near-infrared spectra
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-12-21 DOI: 10.1016/j.chemolab.2024.105313
Minh-Quan Nguyen , Mizuki Tsuta , Mito Kokawa
{"title":"Assessing robust prediction models without test datasets: A causal discovery approach on near-infrared spectra","authors":"Minh-Quan Nguyen ,&nbsp;Mizuki Tsuta ,&nbsp;Mito Kokawa","doi":"10.1016/j.chemolab.2024.105313","DOIUrl":"10.1016/j.chemolab.2024.105313","url":null,"abstract":"<div><div>Machine learning prediction models calibrated with spectral data use correlations between variables without considering causation. The absence of genuine cause–effect relations hinders the ability to ensure methodical prediction reproducibility. Therefore, tools supporting causal-based discovery are essential in spectroscopy and chemometrics to enhance robustness. Accordingly, this study invokes causal inference theory to establish the causal discovery index (CDI) to distinguish datasets with reliable causal structures from those prone to spurious correlations. This framework was applied to seven simulated near-infrared spectral causal structures. Simulated near-infrared spectra were utilized to ensure that the framework performance was optimized and verified appropriately in a generalized methodology. Reliable structures were confirmed to be differentiated by the differences in the mean and standard deviation of bootstrapped CDI indices. Distinctive thresholds for the mean and standard deviation were established at the sample size of 1000 and 10,000. The framework consistently performed well with multiple spectral preprocessing methods such as derivation and dimension reduction. It was also robust with variations, surpassing the conventional test-set validation method without the use of additional independent datasets. This would benefit the applicability of the novel framework in practical situations where dataset collection can be limited. Moreover, it can be extended to various sensor-based data, encompassing only seven possible causal structures.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105313"},"PeriodicalIF":3.7,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143156195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multiplatform spectralprint strategies for the authentication of Spanish PDO fortified wines using AHIMBU, an automatic hierarchical classification tool
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-12-20 DOI: 10.1016/j.chemolab.2024.105311
Rocío Ríos-Reina , M. Pilar Segura-Borrego , Jose M. Camiña , Raquel M. Callejón , Silvana M. Azcarate
{"title":"Multiplatform spectralprint strategies for the authentication of Spanish PDO fortified wines using AHIMBU, an automatic hierarchical classification tool","authors":"Rocío Ríos-Reina ,&nbsp;M. Pilar Segura-Borrego ,&nbsp;Jose M. Camiña ,&nbsp;Raquel M. Callejón ,&nbsp;Silvana M. Azcarate","doi":"10.1016/j.chemolab.2024.105311","DOIUrl":"10.1016/j.chemolab.2024.105311","url":null,"abstract":"<div><div>Spanish fortified wines with Protected Designation of Origin (PDO) are esteemed for their deep-rooted tradition, historical significance, and exceptional viticultural quality. Spain boasts four PDOs: ‘Condado de Huelva’, ‘Jerez-Xérès-Sherry’, ‘Sanlúcar de Barrameda’, and ‘Montilla-Moriles', which produce different types of wines—Fino and Manzanilla undergo biological aging, Olorosos experience oxidative aging, and Amontillados benefit from mixed aging. Due to their long aging periods and significant production costs and hence, their high value, these wines are susceptible to fraud, emphasizing the necessity for robust authentication methods. In response to this need, this study explores emerging technologies, such as spectroscopic techniques coupled with different chemometric approaches, to offer rapid, straightforward, and cost-effective solutions to ensure the authenticity of PDO wines. A comprehensive set of PDO fortified wines, encompassing various types and origins, was analyzed by near and mid-infrared (NIR and MIR) and ultraviolet–visible (UV–Vis) spectroscopies. Preprocessed data were modelled individually, as well as after low-level data fusion, using partial least squares-discriminant analysis (PLS-DA) and a new available chemometric tool named Automatic Hierarchical Model Builder (AHIMBU). The results obtained showed that the hierarchical classification model generated by AHIMBU outperformed the single PLS-DA models, offering enhanced classification accuracy and efficiency (i.e., the correct classification rate increased by around 40 % from the single PLS-DA models to the AHIMBU models). Among the spectroscopic techniques applied, UV–Vis spectroscopy emerged as the most effective for authentication purposes.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105311"},"PeriodicalIF":3.7,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143156196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Utilization of artificial intelligence for evaluation of targeted cancer therapy via drug nanoparticles to estimate delivery efficiency to various sites
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-12-17 DOI: 10.1016/j.chemolab.2024.105309
Wael A. Mahdi , Adel Alhowyan , Ahmad J. Obaidullah
{"title":"Utilization of artificial intelligence for evaluation of targeted cancer therapy via drug nanoparticles to estimate delivery efficiency to various sites","authors":"Wael A. Mahdi ,&nbsp;Adel Alhowyan ,&nbsp;Ahmad J. Obaidullah","doi":"10.1016/j.chemolab.2024.105309","DOIUrl":"10.1016/j.chemolab.2024.105309","url":null,"abstract":"<div><div>Poor delivery efficiency of drug nanoparticles to tumor sites in targeted cancer therapy is a major issue towards developing this technique. The type of drug nanocarrier, its shape, size, materials. and physicochemical properties play important roles on the delivery efficiency which should be well understood. This study presents a machine learning approach to predict the delivery efficiency of nanoparticles across various organs for targeted cancer therapy via nanoparticles. The focus was made on three advanced regression models: Gaussian Process Regression (GPR), Extra Trees (ET) regression, and Local Polynomial Regression (LPR). The integration of these models into the analysis of a complex biomedical dataset—comprising 534 records of nanoparticle properties and their distribution across organs such as the tumor, heart, liver, spleen, lung, and kidney—demonstrates their potential in enhancing predictive accuracy in chemical and biological processes. GPR, a non-parametric probabilistic model, was selected for its robustness in handling small, intricate datasets with complex nonlinear relationships, offering precise uncertainty quantification. ET regression, an ensemble learning method, was chosen for its resilience against overfitting in high-dimensional data, thanks to its unique approach of constructing multiple unpruned decision trees with randomized splits. LPR was included for its ability to capture local trends in data, providing nuanced predictions without assuming a global parametric form. The dataset underwent rigorous preprocessing, including missing data imputation using the Multivariate Imputation by Chained Equations (MICE) method, outlier detection through Subspace Outlier Detection (SOD), and feature selection using Conditional Mutual Information (CMI). Z-score normalization was applied to standardize the features, aligning them with the Gaussian assumptions of GPR and improving the overall performance of the models. The models were optimized using the Whale Optimization Algorithm (WOA) to maximize predictive accuracy, with GPR and ET models showing significant improvements over baseline models in predicting the biodistribution outcomes.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105309"},"PeriodicalIF":3.7,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal designs for mixture choice experiments by simulated annealing
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-12-16 DOI: 10.1016/j.chemolab.2024.105305
Yicheng Mao , Roselinde Kessels
{"title":"Optimal designs for mixture choice experiments by simulated annealing","authors":"Yicheng Mao ,&nbsp;Roselinde Kessels","doi":"10.1016/j.chemolab.2024.105305","DOIUrl":"10.1016/j.chemolab.2024.105305","url":null,"abstract":"<div><div>Mixture choice experiments investigate people’s preferences for products composed of different ingredients. To ensure the quality of the experimental design, many researchers use Bayesian optimal design methods. Efficient search algorithms are essential for obtaining such designs. Yet, research in the field of mixture choice experiments is not extensive. Our paper pioneers the use of a simulated annealing (SA) algorithm to construct Bayesian optimal designs for mixture choice experiments. Our SA algorithm not only accepts better solutions, but also has a certain probability of accepting inferior solutions. This approach effectively prevents rapid convergence, enabling broader exploration of the solution space. Although our SA algorithm may start more slowly than the widely used mixture coordinate-exchange method, it generally produces higher-quality mixture choice designs after a reasonable runtime. We demonstrate the superior performance of our SA algorithm through extensive computational experiments and a real-life example.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105305"},"PeriodicalIF":3.7,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An accurate prediction of drug–drug interactions and side effects by using integrated convolutional and BiLSTM networks
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-12-15 DOI: 10.1016/j.chemolab.2024.105304
Sabir Ali , Waleed Alam , Hilal Tyara , Kil To Chong
{"title":"An accurate prediction of drug–drug interactions and side effects by using integrated convolutional and BiLSTM networks","authors":"Sabir Ali ,&nbsp;Waleed Alam ,&nbsp;Hilal Tyara ,&nbsp;Kil To Chong","doi":"10.1016/j.chemolab.2024.105304","DOIUrl":"10.1016/j.chemolab.2024.105304","url":null,"abstract":"<div><div>Multiple drugs have gained attention for the treatment of complex diseases. However, while numerous drugs offer benefits, they also cause undesirable side effects. Accurate prediction of drug–drug interactions is crucial in drug discovery and safety research. Therefore, an efficient and reliable computational method is necessary for predicting drug–drug interactions and their associated side effects. In this study, we introduce a computational method based on integrating convolutional and BiLSTM networks to predict the types of drug–drug interactions. The Morgan fingerprints approach was utilized to encode the drug’s SMILES, and the Tanimoto coefficient structural similarity profile-based approach was used to determine similarities. These encoded drugs were passed through convolutional and BiLSTM layers to extract important feature maps. The ReLU activation function and the dense layer were employed for feature dimensionality reduction. The last dense layer used the softmax function to classify the 86 types of drug–drug interactions. The proposed model achieved a performance of 95.38% accuracy and 98.78% AUC, respectively. The proposed model outperformed and surpassed all the existing state-of-the-art models.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105304"},"PeriodicalIF":3.7,"publicationDate":"2024-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimization methods for tensor decomposition: A comparison of new algorithms for fitting the CP(CANDECOMP/PARAFAC) model
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-12-13 DOI: 10.1016/j.chemolab.2024.105290
Huiwen Yu , Kasper Green Larsen , Ove Christiansen
{"title":"Optimization methods for tensor decomposition: A comparison of new algorithms for fitting the CP(CANDECOMP/PARAFAC) model","authors":"Huiwen Yu ,&nbsp;Kasper Green Larsen ,&nbsp;Ove Christiansen","doi":"10.1016/j.chemolab.2024.105290","DOIUrl":"10.1016/j.chemolab.2024.105290","url":null,"abstract":"<div><div>Tensor decomposition is widely used for multi-way data analysis and computations in chemical science. CP decomposition is one of the most useful tensor decomposition models for capturing the essential information in massive multi-way chemical data and for efficiently performing computations with such tensors. However, efficiently and accurately computing the tensor decomposition itself is a nontrivial problem that sometimes limits the advantage of tensor decomposition methods. In this work we propose and test three new decomposition algorithms, that are defined from extrapolation ideas applied to the alternating least square (ALS) algorithm for CP tensor decomposition. The performance of the proposed algorithms are validated on both a variety of simulated datasets and real experimental datasets including fluorescence spectroscopy data, hyperspectral data and electroencephalogram (EEG) data. The results show that the proposed algorithms significantly accelerate the standard CP-ALS decomposition while maintaining favorable accuracy. One of the proposed methods, denoted direct inversion of the iterative subspace-like extrapolated ALS(CP-AD), is inspired from widely used extrapolation procedures used in the context of solving non-linear equations in quantum chemistry, and shows a particular attractive combination of a much reduced number of iterations needed for convergence, and modest computational cost. For example, CP-AD provided resulting tensors of similar accuracy but significantly lower computational cost than the standard CP-ALS algorithm and the widely used line-search based CP-ALS extrapolation procedure. The proposed methodology may thereby boost the application of tensor decomposition modeling in both experimental and computational chemistry.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105290"},"PeriodicalIF":3.7,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stacking ensemble learning algorithm based rapid inverse modelling of copper grade using imaging spectral data
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-12-12 DOI: 10.1016/j.chemolab.2024.105308
Jingli Wang , Jingxiang Gao
{"title":"Stacking ensemble learning algorithm based rapid inverse modelling of copper grade using imaging spectral data","authors":"Jingli Wang ,&nbsp;Jingxiang Gao","doi":"10.1016/j.chemolab.2024.105308","DOIUrl":"10.1016/j.chemolab.2024.105308","url":null,"abstract":"<div><div>The determination of copper ore grade in a reasonably fast and accurate manner is of great practical significance for the purposes of ore dressing and ore allocation in mines. The most common method of determining the grade of copper ore is chemical analysis. However, this method has several disadvantages, including a lengthy determination period, the possibility of chemical pollution, and a lag in the results of ore dressing and ore allocation. Hyperspectral imaging technology is capable of both spectral resolution and image resolution. It is able to obtain the indicators of the sample to be measured while retaining its original physical and chemical properties. This makes it possible to overcome the shortcomings of traditional methods, allowing for accurate, non-destructive, environmentally friendly, rapid detection of samples. Stacking can often provide higher predictive accuracy than a single model by combining the predictions of multiple models, and has the advantages of reduced overfitting, model diversity, flexibility and adaptability. Stacking ensemble learning algorithm is rarely used for hyperspectral quantitative inversion modelling. In this study, 138 copper samples from the Mirador Copper Mine were employed as a data source. The spectral data of the copper samples and chemical analyses of the copper grades were collected utilising a Pika L with a Pika NIR-320 hyperspectral imager. Firstly, the raw spectral data were subjected to mutual information computation as a means of serial fusion of the spectral data, and the fused data were subjected to SG smoothing to remove noise from the spectral experiments. Subsequently, the pre-processed spectral data were subjected to feature band extraction utilising the CARS and CARS-SPA algorithms with the objective of eliminating uninformative variables and extracting valid spectral information. Finally, based on the Stacking algorithm, a highly reliable copper grade estimation model was constructed by combining various machine learning methods, and transfer learning was used to verify the accuracy and generalisation of the model. The findings of the study indicate that the feature bands selected by CARS-SPA encompass spectral ranges with sufficient chemical information, while uninformative variables are largely excluded, resulting in a notable increase in the speed and accuracy of modelling inversion operations. The Stacking ensemble learning model is more suitable for the prediction of copper grade in the Mirador copper mine compared to a single inversion model, and the CARS-SPA-Stacking inversion model has the highest accuracy, with R<sup>2</sup>, RMSE, MAE, RPD, MAPE and CV reaching 0.936, 0.040, 0.019, 4.018, 0.059 and 0.267, respectively. This study is pertinent to the application of fused imaging spectral data in conjunction with the Stacking ensemble learning algorithm to copper grade inversion at the Mirador copper mine.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105308"},"PeriodicalIF":3.7,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved multivariate sensor delay estimation using a hierarchical clustering-based approach
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-12-09 DOI: 10.1016/j.chemolab.2024.105306
Bente M. van Son , Tim Offermans , Carlo G. Bertinetto , Jeroen J. Jansen
{"title":"Improved multivariate sensor delay estimation using a hierarchical clustering-based approach","authors":"Bente M. van Son ,&nbsp;Tim Offermans ,&nbsp;Carlo G. Bertinetto ,&nbsp;Jeroen J. Jansen","doi":"10.1016/j.chemolab.2024.105306","DOIUrl":"10.1016/j.chemolab.2024.105306","url":null,"abstract":"<div><div>An often overlooked challenge in multivariate statistical modelling of industrial data is the presence of time delays caused by the residence time in the process, leading to event misalignment. To perform accurate data analysis, time delays must be estimated and corrected using a dedicated preprocessing step. Despite the multivariate nature of process data, most existing statistical Time Delay Estimation (TDE) methods only consider bivariate correlations. This study hypothesized that multivariate TDE methods would outperform bivariate methods, particularly with a large number of sensors. To test this, we selected data subsets with varying numbers of sensors using correlation-based hierarchical clustering and applied different TDE methods. Results showed that two multivariate methods, <em>PLS-CON-LOAD</em> and <em>PLS-SEQ</em>, outperformed the bivariate methods, exhibiting lower errors in the time delay estimation and less sensitivity to the number of sensors. Additionally, we proposed an enhancement to the TDE methods by embedding a clustering step to determine the order in which time delays should be estimated. This approach reduced TDE errors for all methods when number of sensors is high. We recommend the newly proposed clustering-based <em>PLS-CON-LOAD</em> method for low-error time delay estimation, which enhances the predictive value and insights obtainable from industrial data analysis.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105306"},"PeriodicalIF":3.7,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
In silico prediction of metabolic stability for ester-containing molecules: Machine learning and quantum mechanical methods
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-12-09 DOI: 10.1016/j.chemolab.2024.105292
Shiwei Deng , Yiyang Wu , Zhuyifan Ye , Defang Ouyang
{"title":"In silico prediction of metabolic stability for ester-containing molecules: Machine learning and quantum mechanical methods","authors":"Shiwei Deng ,&nbsp;Yiyang Wu ,&nbsp;Zhuyifan Ye ,&nbsp;Defang Ouyang","doi":"10.1016/j.chemolab.2024.105292","DOIUrl":"10.1016/j.chemolab.2024.105292","url":null,"abstract":"<div><div>Carboxylic ester is an important functional group frequently used in the design of pro-drugs and soft-drugs. It is critical to understand the structure-metabolic stability relationships of these types of drugs. This work aims to predict the metabolic stability of ester-containing molecules in human plasma/blood by both machine learning and quantum mechanical methods. A dataset comprising metabolic half-lives with 656 molecules was collected for machine learning models. Three molecular representations (extended-connectivity fingerprint, Chemopy descriptor and Mordred3D descriptor) were used in combination with four machine learning algorithms (LightGBM, support vector machine, random forest, and k-nearest neighborhood). Furthermore, ensemble learning was applied to integrate the predictions of the individual models to achieve improved prediction results. The consensus model reached coefficient of determination values of 0.793 on the test set and 0.695 on the external validation set, respectively. Feature importances of machine learning models were interpreted from SHapley Additive exPlanations, which were consistent with previous esterase-catalyzed hydrolysis reaction mechanism. Moreover, a quantum mechanical model was built to calculate the energy gap of esterase-catalyzed hydrolysis reaction, deriving metabolic stability ranks. Abilities of quantum mechanical model to discriminate relative metabolic stability for molecules in external validation set was compared with machine learning model. Advantages and disadvantages of machine learning and quantum mechanical methods in metabolic stability prediction were discussed. In summary, this work can serve as an <em>in silico</em> high throughput screening tool to accelerate the early development process of pro-drugs and soft-drugs.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105292"},"PeriodicalIF":3.7,"publicationDate":"2024-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143156201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating oil recovery efficiency of carbonated water injection with supervised machine learning paradigms and implications for uncertainty analysis
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-12-08 DOI: 10.1016/j.chemolab.2024.105303
Joshua Nsiah Turkson , Muhammad Aslam Md Yusof , Ingebret Fjelde , Yen Adams Sokama-Neuyam , Victor Darkwah-Owusu
{"title":"Estimating oil recovery efficiency of carbonated water injection with supervised machine learning paradigms and implications for uncertainty analysis","authors":"Joshua Nsiah Turkson ,&nbsp;Muhammad Aslam Md Yusof ,&nbsp;Ingebret Fjelde ,&nbsp;Yen Adams Sokama-Neuyam ,&nbsp;Victor Darkwah-Owusu","doi":"10.1016/j.chemolab.2024.105303","DOIUrl":"10.1016/j.chemolab.2024.105303","url":null,"abstract":"<div><div>Limited efforts have been made to develop a time-efficient and cost-effective predictive model capable of estimating the oil recovery efficiency of carbonated water injection (CWI). Therefore, in this study, we utilized supervised machine learning (ML) techniques: decision tree, support vector regression, and random forest (RF) to predict the recovery efficiency of CWI, with experimental conditions, rock properties, and fluid properties as predictors. The influence of various parameters on oil recovery efficiency was assessed using correlation technique, permutation importance, and Shapley Additive Explanations (SHAP), which sets our study apart from existing studies. Generally, the ML models yielded remarkable recovery efficiency prediction results, achieving coefficients of determination, mean absolute errors, and root mean square errors of 0.81–0.87, 4.30–4.96 %, and 4.82–5.89 %, respectively. The RF model outperformed its counterparts. Most importantly, the RF model successfully predicted the recovery efficiency on entirely new data with an error and absolute relative error of less than 15 % and 19 % respectively According to the SHAP analysis, high injection rate, porosity, permeability, and pressure improve oil recovery, and vice versa. Similarly, low temperature, oil density and viscosity, and salinity enhance oil recovery while injection rate and temperature were the most and least influential parameters, respectively. The RF model was successfully deployed to predict the oil recovery efficiency for 1000 randomly generated sets of independent variables in conjunction with Monte Carlo simulation, demonstrating the applicability of the model in uncertainty analysis. The current modeling study not only bridges the knowledge gaps in predictive modeling of the oil recovery efficiency of CWI but also holds significant promise for rapid estimation and optimization of oil recovery efficiency.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"257 ","pages":"Article 105303"},"PeriodicalIF":3.7,"publicationDate":"2024-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143155811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信