Chemometrics and Intelligent Laboratory Systems最新文献

筛选
英文 中文
Multivariate SPC via sequential multiblock-PLS 通过连续多区块 PLS 实现多变量 SPC
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-09-21 DOI: 10.1016/j.chemolab.2024.105236
Joan Borràs-Ferrís , Carl Duchesne , Alberto Ferrer
{"title":"Multivariate SPC via sequential multiblock-PLS","authors":"Joan Borràs-Ferrís ,&nbsp;Carl Duchesne ,&nbsp;Alberto Ferrer","doi":"10.1016/j.chemolab.2024.105236","DOIUrl":"10.1016/j.chemolab.2024.105236","url":null,"abstract":"<div><div>The sequential multi-block partial least squares (SMB-PLS) is proposed for implementing a multivariate statistical process control scheme. This is of interest when the system is composed of several blocks following a sequential order and presenting correlated information, for instance, a raw material properties block followed by a process variables block that is manipulated according to raw material properties. The SMB-PLS uses orthogonalization to separate correlated information between blocks from orthogonal variations. This allows monitoring the system in different stages considering only the remaining orthogonal part in each block. Thus, the SMB-PLS increases the interpretability and process understanding in the model building (Phase I), since it provides a deep insight about the nature of the system variations. Besides, it prevents any special cause from propagating to subsequent blocks enabling their use in the model exploitation (Phase II). The methodology is applied to a real case study from a food manufacturing process.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105236"},"PeriodicalIF":3.7,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142421520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “Random projection ensemble conformal prediction for high-dimensional classification” [Chemometr. Intell. Lab. Syst. 253 (2024) 1–10, 105225] "用于高维分类的随机投影集合共形预测"[Chemometr. Intell. Lab. Syst. 253 (2024) 1-10, 105225]更正
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-09-18 DOI: 10.1016/j.chemolab.2024.105235
Xiaoyu Qian , Jinru Wu , Ligong Wei , Youwu Lin
{"title":"Corrigendum to “Random projection ensemble conformal prediction for high-dimensional classification” [Chemometr. Intell. Lab. Syst. 253 (2024) 1–10, 105225]","authors":"Xiaoyu Qian ,&nbsp;Jinru Wu ,&nbsp;Ligong Wei ,&nbsp;Youwu Lin","doi":"10.1016/j.chemolab.2024.105235","DOIUrl":"10.1016/j.chemolab.2024.105235","url":null,"abstract":"","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105235"},"PeriodicalIF":3.7,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142551896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A scalable, data analytics workflow for image-based morphological profiles 基于图像形态剖面的可扩展数据分析工作流程
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-09-16 DOI: 10.1016/j.chemolab.2024.105232
Edvin Forsgren , Olivier Cloarec , Pär Jonsson , Gillian Lovell , Johan Trygg
{"title":"A scalable, data analytics workflow for image-based morphological profiles","authors":"Edvin Forsgren ,&nbsp;Olivier Cloarec ,&nbsp;Pär Jonsson ,&nbsp;Gillian Lovell ,&nbsp;Johan Trygg","doi":"10.1016/j.chemolab.2024.105232","DOIUrl":"10.1016/j.chemolab.2024.105232","url":null,"abstract":"<div><p>Cell Painting is an established community-based microscopy-assay platform that provides high-throughput, high-content data for biological readouts. In November 2022, the JUMP-Cell Painting Consortium released the largest publicly available Cell Painting dataset with CellProfiler features, comprising more than 2 billion cell images. This dataset is designed for predicting the activity and toxicity of 115k drug compounds, with the aim to make cell images as computable as genomes and transcriptomes. In this context, our paper introduces a scalable and computationally efficient data analytics workflow created to meet the needs of researchers. This data-driven workflow facilitates the comparison of drug treatment effects through significant and biologically relevant insights. The workflow consists of two parts: first, the Equivalence score (Eq. score), a straightforward yet sophisticated metric highlighting relevant deviations from negative controls based on cell image morphology; second, the scalability of the workflow, by utilizing the Eq. scores on a large scale to predict and classify the subtle morphological changes in cell image profiles. By doing so, we show classification improvements compared to using the raw CellProfiler features on the CPJUMP1-pilot dataset on three types of perturbations.</p><p>We hope that our workflow’s contributions will enhance drug screening efficiency and streamline the drug development process. As this process is resource-intensive, every incremental improvement is valuable. Through our collective efforts in advancing the understanding of high-throughput image-based data, we aim to reduce both the time and cost of developing new, life-saving treatments.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105232"},"PeriodicalIF":3.7,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924001722/pdfft?md5=8447cc5a34c516ef2e46efef43419f28&pid=1-s2.0-S0169743924001722-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142270415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Toward effective SVM sample reduction based on fuzzy membership functions 基于模糊成员函数实现有效的 SVM 样本缩减
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-09-10 DOI: 10.1016/j.chemolab.2024.105233
Tinghua Wang, Daili Zhang, Hanming Liu
{"title":"Toward effective SVM sample reduction based on fuzzy membership functions","authors":"Tinghua Wang,&nbsp;Daili Zhang,&nbsp;Hanming Liu","doi":"10.1016/j.chemolab.2024.105233","DOIUrl":"10.1016/j.chemolab.2024.105233","url":null,"abstract":"<div><p>Support vector machine (SVM) is known for its good generalization performance and wide application in various fields. Despite its success, the learning efficiency of SVM decreases significantly originating from the assumption that the number of training samples increases rapidly. Consequently, the traditional SVM with standard optimization methods faces challenges such as excessive memory requirements and slow training speed, especially for large-scale training sets. To address this issue, this paper draws inspiration from the fuzzy support vector machine (FSVM). Considering that each sample has varying contributions to the decision plane, we propose an effective SVM sample reduction method based on the fuzzy membership function (FMF). This method uses FMF to calculate the fuzzy membership of each training sample. Training samples with low fuzzy memberships are then deleted. Specifically, we propose SVM sample reduction algorithms based on class center distance, kernel target alignment, centered kernel alignment, slack factor, entropy, and bilateral weighted FMF, respectively. Comprehensive experiments on UCI and KEEL datasets demonstrate that our proposed algorithms outperform other comparative methods in terms of accuracy, F-measure, and hinge-loss measures.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105233"},"PeriodicalIF":3.7,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HEnsem_DTIs: A heterogeneous ensemble learning model for drug-target interactions prediction HEnsem_DTIs:药物-靶点相互作用预测的异质集合学习模型
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-09-02 DOI: 10.1016/j.chemolab.2024.105224
Mohammad Reza Keyvanpour , Yasaman Asghari , Soheila Mehrmolaei
{"title":"HEnsem_DTIs: A heterogeneous ensemble learning model for drug-target interactions prediction","authors":"Mohammad Reza Keyvanpour ,&nbsp;Yasaman Asghari ,&nbsp;Soheila Mehrmolaei","doi":"10.1016/j.chemolab.2024.105224","DOIUrl":"10.1016/j.chemolab.2024.105224","url":null,"abstract":"<div><p>Drug discovery is the process by which a drug is discovered. Drug-target interactions prediction is a major part of drug discovery. Unfortunately, producing new drugs is time-consuming and expensive; Because it requires a lot of human and laboratory resources. Recently, predictions have been made using computational methods to solve these problems and prevent blindly examining all interactions. Various experiences using computational methods show that no single algorithm can be suitable for all applications; Hence, ensemble learning is expressed. Although various ensemble methods have been proposed, it is still not easy to find a suitable ensemble method for a particular dataset. In general, the existing algorithms in aggregation and combination method are selected manually based on experience. Reinforcement learning can be one way to meet this challenge. High-dimensional feature space and class imbalance are among the challenges of drug-target interactions prediction. This paper proposes HEnsem_DTIs, a heterogeneous ensemble model, for predicting drug-target interactions using dimensionality reduction and concepts of recommender systems to address these challenges. HEnsem_DTIs is configured with reinforcement learning. Dimensionality reduction is applied to handle the challenge of high-dimensional feature space and recommender systems to improve under-sampling and solve the class imbalance challenge. Six datasets are used to evaluate the proposed model; Results of the evaluation on datasets show that HEnsem_DTIs works better than other models in this field. Results of evaluation of the proposed model on the first dataset using 10-fold cross-validation experiments show the amount of sensitivity 0.896, specificity 0.954, GM 0.924, AUC 0.930 and AUPR 0.935.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105224"},"PeriodicalIF":3.7,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Random projection ensemble conformal prediction for high-dimensional classification 用于高维分类的随机投影集合共形预测
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-09-02 DOI: 10.1016/j.chemolab.2024.105225
Xiaoyu Qian , Jinru Wu , Ligong Wei , Youwu Lin
{"title":"Random projection ensemble conformal prediction for high-dimensional classification","authors":"Xiaoyu Qian ,&nbsp;Jinru Wu ,&nbsp;Ligong Wei ,&nbsp;Youwu Lin","doi":"10.1016/j.chemolab.2024.105225","DOIUrl":"10.1016/j.chemolab.2024.105225","url":null,"abstract":"<div><p>In classification problems, many models with superior performance fail to provide confidence estimates or intervals for each prediction. This lack of reliability poses risks in real-world applications, making these models difficult to trust. Conformal prediction, as distribution-free and model-free approaches with finite-sample coverage guarantee, have recently been widely used to construct prediction sets for classification models. However, traditional conformal prediction methods only produce set-valued results without specifying a definitive predicted class. Particularly in complex settings, these methods fail to assist models in effectively addressing challenges such as high dimensionality, resulting in ambiguous prediction sets with low statistical efficiency, i.e. the prediction sets contain many false classes. In this study, a novel Ensemble Conformal Prediction algorithm based on Random Projection and a designed voting strategy, RPECP, is developed to tackle these challenges. Initially, a procedure for selecting the approximately oracle random projections and classifiers is executed to best leverage the internal information and structure of the data. Subsequently, based on the approximately oracle random projections and underlying classifiers, conformal prediction is performed on new test samples in a lower-dimensional space, resulting in multiple independent prediction sets. Finally, an accurate predicted class and a precise prediction set with high coverage and statistical efficiency are produced through a designed voting strategy. Compared to several base classifiers, RPECP obtain higher classification accuracy; against other conformal prediction algorithms, it achieves less ambiguous prediction sets with fewer false classes while guaranteeing high coverage. For illustration, this paper demonstrates RPECP's superiority over other methods in four cases: two high-dimensional settings and two real-world datasets.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105225"},"PeriodicalIF":3.7,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142147568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
G-CovSel: Covariance oriented variable clustering G-CovSel:以协方差为导向的变量聚类
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-08-29 DOI: 10.1016/j.chemolab.2024.105223
Jean-Michel Roger , Alessandra Biancolillo , Bénédicte Favreau , Federico Marini
{"title":"G-CovSel: Covariance oriented variable clustering","authors":"Jean-Michel Roger ,&nbsp;Alessandra Biancolillo ,&nbsp;Bénédicte Favreau ,&nbsp;Federico Marini","doi":"10.1016/j.chemolab.2024.105223","DOIUrl":"10.1016/j.chemolab.2024.105223","url":null,"abstract":"<div><p>Dimensionality reduction is an essential step in the processing of analytical chemistry data. When this reduction is carried out by variable selection, it can enable the identification of biochemical pathways. CovSel has been developed to meet this requirement, through a parsimonious selection of non-redundant variables. This article presents the g-CovSel method, which modifies the CovSel algorithm to produce highly complementary groups containing highly correlated variables. This modification requires the theoretical definition of the groups' construction and of the deflation of the data with respect to the selected groups. Two applications, on two extreme case studies, are presented. The first, based on near-infrared spectra related to four chemicals, demonstrates the relevance of the selected groups and the method's ability to handle highly correlated variables. The second, based on genomic data, demonstrates the method's ability to handle very highly multivariate data. Most of the groups formed can be interpreted from a functional point of view, making g-CovSel a tool of choice for biomarker identification in omics. Further work will be carried out to generalize g-CovSel to multi-block and multi-way data.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105223"},"PeriodicalIF":3.7,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924001631/pdfft?md5=52fb71b18968f61fe29df549f8fc05f7&pid=1-s2.0-S0169743924001631-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142151154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing quantitative 1H NMR model generalizability on honey from different years through partial least squares subspace and optimal transport based unsupervised domain adaptation 通过偏最小二乘子空间和基于无监督域适应的优化传输,增强不同年份蜂蜜的定量 1H NMR 模型通用性
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-08-28 DOI: 10.1016/j.chemolab.2024.105221
Peng Shan , Hongming Xiao , Xiang Li , Ruige Yang , Lin Zhang , Yuliang Zhao
{"title":"Enhancing quantitative 1H NMR model generalizability on honey from different years through partial least squares subspace and optimal transport based unsupervised domain adaptation","authors":"Peng Shan ,&nbsp;Hongming Xiao ,&nbsp;Xiang Li ,&nbsp;Ruige Yang ,&nbsp;Lin Zhang ,&nbsp;Yuliang Zhao","doi":"10.1016/j.chemolab.2024.105221","DOIUrl":"10.1016/j.chemolab.2024.105221","url":null,"abstract":"<div><div>Honey is a nourishing and natural food product that is widely favored by a diverse group of consumers. Proton Nuclear Magnetic Resonance (<sup>1</sup>H NMR) is a powerful tool for quantitative analysis of honey and plays a crucial role in ensuring its quality. The <sup>1</sup>H NMR technique necessitates the utilization of multivariate calibration models to facilitate the quantitative analysis of key compounds present in honey. However, maintaining consistent measurement conditions across different years is scarcely possible, which can significantly impact the distribution of training and test spectra, ultimately leading to reduced performance of predictive models. Unsupervised domain adaptation (UDA) methods have gained considerable attention for their ability to match distribution differences between the labeled source spectra and the unlabeled target spectra without costly annotation. To enhance the quantitative model generalizability on honey from different years, we propose a UDA method known as partial least squares subspace and optimal transport-based UDA (PLSS-OT-UDA). This approach eliminates distribution differences between the source subspace and target subspace via partial least squares (PLS) dimensionality reduction and OT. Firstly, the optimal latent variable weight matrix from the source domain (i.e., labeled <sup>1</sup>H NMR data in 2017) is extracted with PLS. Next, the dimension of both source and target domains (i.e., unlabeled <sup>1</sup>H NMR data in 2018) is reduced and their corresponding subspaces are obtained with weight matrix of the source domain. Finally, OT is then employed to align the distribution of the source and target domains within the subspace. Experimental results on the honey dataset demonstrate that the PLSS-OT-UDA outperforms traditional methods, including transfer component analysis (TCA), optimal transport for domain adaptation (OTDA), domain adaptation based on principal component analysis and optimal transport (PCA-OTDA), and subspace alignment (SA), with respect to generalization performance on three components: baume degree, sugar content, and water content.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"254 ","pages":"Article 105221"},"PeriodicalIF":3.7,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142441360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing topological descriptors of guar gum and its derivatives for predicting physical properties in carbohydrates 分析瓜尔胶及其衍生物的拓扑描述符以预测碳水化合物的物理性质
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-08-24 DOI: 10.1016/j.chemolab.2024.105203
Xiujun Zhang , Shamaila Yousaf , Anisa Naeem , Ferdous M. Tawfiq , Adnan Aslam
{"title":"Analyzing topological descriptors of guar gum and its derivatives for predicting physical properties in carbohydrates","authors":"Xiujun Zhang ,&nbsp;Shamaila Yousaf ,&nbsp;Anisa Naeem ,&nbsp;Ferdous M. Tawfiq ,&nbsp;Adnan Aslam","doi":"10.1016/j.chemolab.2024.105203","DOIUrl":"10.1016/j.chemolab.2024.105203","url":null,"abstract":"<div><p>Guar gum is a non-ionic polysaccharide found in abundance in nature. It may be used as a thickening agent, stabilizer, or emulsifier in pharmaceutical formulations, food products, or cosmetics. Its ability to form viscous solutions makes it useful in drug delivery systems, controlled release formulations, and as a matrix for oral drug delivery. The investigation of chemical structures through graph invariants is of great concern. Topological descriptors are numerical numbers associated with the molecular structure and have the ability to predict certain physical and chemical properties of the underlying structure. In this paper, we have calculated the harmonic index, the inverse sum indeg index, the third Zagreb index, the Hyper Zagreb index, the sigma index, the reformulated first Zagreb index, the reformulated multiplicative first Zagreb index, the Harmonic–arithmetic index, and the Atom Bond sum connectivity indices of guar gum and its chemical derivatives. Finally, the chemical applicability of these topological descriptors is checked for different carbohydrates (monosaccharides, disaccharides, and polysaccharides) by using straight-line, parabolic and logarithmic regression models. It has been observed that these topological descriptors are useful to predict two physical properties, namely density and molecular weight.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105203"},"PeriodicalIF":3.7,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142049081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretation of high dimensional definitive screening designs assisted by bootstrapped partial least squares regression 利用引导偏最小二乘法回归解释高维确定性筛选设计
IF 3.7 2区 化学
Chemometrics and Intelligent Laboratory Systems Pub Date : 2024-08-24 DOI: 10.1016/j.chemolab.2024.105218
Knut Dyrstad , Frank Westad
{"title":"Interpretation of high dimensional definitive screening designs assisted by bootstrapped partial least squares regression","authors":"Knut Dyrstad ,&nbsp;Frank Westad","doi":"10.1016/j.chemolab.2024.105218","DOIUrl":"10.1016/j.chemolab.2024.105218","url":null,"abstract":"<div><p>Definitive screening design (DSD) has become a widely used type of Design of Experiments for chemical, pharmaceutical and biopharmaceutical processes and product development due to its optimization properties with an estimation of main, interaction, and squared variable effects with a minimum number of experiments. These high dimensional DOEs with more variables than samples, and with partly correlated variables, make the statistical interpretation frequently challenging. The purpose of the study was to test bootstrap PLSR using a heredity procedure to select the variable subset to be finally evaluated by MLR. The heredity selection was used on bootstrap T values given by original PLSR coefficients (B) divided on the bootstrap estimated standard deviation. The investigated fractional weighted and non-parametric bootstrap PLSR resulted in same variable selection outcome and final models in this study.</p><p>A simulation study with 7 main variables and 12 tested literature real data DSDs with 4, 5, 7 and 8 main variables showed improved model performance for small and particularly for large DSDs for the bootstrap PLSR MLR methods compared to two common DSD reference methods; DSD fit definitive screening and AICc forward stepwise regression (AICc FSR). Variable selection accuracy and predictive ability were significantly improved by the investigated method in 6 out of 13 DSDs compared to the best model from either of the two reference methods. The remaining 7 DSDs gave the same model as best reference model. Strong heredity was found to provide the best models for all real data in this study. The use of the heredity procedure on the percent non-zero SVEM FSR variable effects followed by MLR showed promising results. AICc Lasso regression was among other methods partially tested and was found to set almost all variables to zero effect when tested on three large minimum DSDs. While the DSD fit definitive screening method may often be the first choice for DSD, the heredity bootstrap PLSR MLR and heredity SVEM FSR MLR may be alternative methods to improve the variable selection and model precision.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"253 ","pages":"Article 105218"},"PeriodicalIF":3.7,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142096793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信