Haiyang Ye, Yunyi Zhang, Zilong Li, Yue Peng, Peng Zhou
{"title":"Comprehensive evaluation and systematic comparison of Gaussian process (GP) modelling applications in peptide quantitative structure-activity relationship","authors":"Haiyang Ye, Yunyi Zhang, Zilong Li, Yue Peng, Peng Zhou","doi":"10.1016/j.chemolab.2024.105191","DOIUrl":"10.1016/j.chemolab.2024.105191","url":null,"abstract":"<div><p>Peptide quantitative structure-activity relationship (pQSAR) is a specific extension of traditional QSARs from small-molecule drugs to bioactive peptides. Since peptides are linear biopolymers that are essentially different to small-molecule compounds in terms of their structural features such as ordering sequence, large size and intrinsic flexibility, the pQSAR methodology (including structural characterization and regression modelling) should be further exploited relative to traditional QSARs. Gaussian process (GP) serves as a pioneering Bayesian-based machine learning (ML) solution for tackling linear/nonlinear-hybrid regression issues in intricate domains. However, the applications of GP regression in QSAR and, particularly, the pQSAR still remain largely unexplored to date. In this work, we launched a comprehensive pQSAR study with GP regression modelling, aiming to the deep evaluation of GP performance based on different characterizations and also the systematic comparison of GP with other routine MLs. Here, we culled two distinct classes of peptide datasets, which separately comprise 12 panels of sophisticated benchmarks and 46 panels of extended samples, totally containing 8804 peptide samples and systematically resulting in 522 regression models. Our study indicated that the GP can generally provide an effective solution for many pQSAR problems with the potential to promote ML regression modelling in this area, which is comparable with or even better than those widely used methods on both the sophisticated benchmarks and extended samples. In addition, GP also has many advantages as compared to traditional MLs, such as hyperparameter self-consistency, overfitting resistance, interpretable output and estimable uncertainty.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105191"},"PeriodicalIF":3.7,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Capturing connectivity information from process flow diagrams by sequential-orthogonalized PLS to improve soft-sensor performance","authors":"Qiang Zhu , Pierantonio Facco , Zhonggai Zhao , Massimiliano Barolo","doi":"10.1016/j.chemolab.2024.105192","DOIUrl":"10.1016/j.chemolab.2024.105192","url":null,"abstract":"<div><p>In the development of data-driven soft sensors for product quality assessment in multi-unit manufacturing processes, the only information that is typically used as an input to the model is real-time measurements from field sensors. However, even if detailed knowledge of the mechanistic behavior of the process may not be available, information about the sequence of processing units, and their connectivity, is available, typically in graphical form through process flow diagrams. In this study, we investigate the use of sequential-orthogonalized partial least-squares (SO-PLS) regression as a way to capture connectivity information from a process flow diagram, and transfer it into a data-driven model to be used as a soft sensor in a multi-unit process. Connectivity between units is captured and translated into a block order that establishes a sequence for block regressions. Orthogonalization between two blocks is then carried out with the aim of eliminating overlapping data and retaining information that is unique to each block. Product quality is finally predicted by summing the contributions from each block, and the accuracy of prediction is enhanced due to the embedded dual feature-extraction procedure, which combines orthogonalization and latent-variable extraction. The effectiveness of the proposed approach is illustrated by comparing the quality prediction performance of two soft sensors for a simulated multi-unit continuous process: one using standard PLS and one using SO-PLS. Superior performance of the SO-PLS soft sensor is achieved, even more markedly so when fewer field measurements are available to build the soft sensor.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105192"},"PeriodicalIF":3.7,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924001321/pdfft?md5=21cec59850f044691a24a0f6930904cf&pid=1-s2.0-S0169743924001321-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Chen , Jianmin Huang , Chenghao Fei , Rao Fu , Min Wei , Hong Zhang , Chang Liu , Qiaosheng Guo , Hongzhuan Shi
{"title":"Tracing the origin of isatidis radix based on multivariate data fusion combined with DBN classification algorithm","authors":"Peng Chen , Jianmin Huang , Chenghao Fei , Rao Fu , Min Wei , Hong Zhang , Chang Liu , Qiaosheng Guo , Hongzhuan Shi","doi":"10.1016/j.chemolab.2024.105190","DOIUrl":"10.1016/j.chemolab.2024.105190","url":null,"abstract":"<div><p>In this study, multidimensional characterization data such as chromaticity value, texture and compositional content of Isatidis Radix from different regions (Anhui; Hubei; Shaanxi; Xinjiang) were collected. By multivariate statistical analysis, 44 characterization factors (VIP >1, <em>P</em> < 0.05) were selected to distinguish the origin of Isatidis Radix. In addition, a unique artificial intelligence algorithm was created and optimized by merging 44 characterization factors with the deep belief network (DBN) classification algorithm. Compared with the traditional discriminant analysis method, the accuracy of this new method was significantly improved, and the discrimination rate of Isatidis Radix origin reached 100 %, and the traceability accuracy of Isatidis Radix also reached 100 %. This study supports the development of intelligent algorithms based on data fusion to track the origin of more agricultural products.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105190"},"PeriodicalIF":3.7,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141841877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongquan Ji, Qingsen Hou, Yingxuan Shao, Yuhao Zhang
{"title":"Incipient fault detection for dynamic processes with canonical variate residual statistics analysis","authors":"Hongquan Ji, Qingsen Hou, Yingxuan Shao, Yuhao Zhang","doi":"10.1016/j.chemolab.2024.105189","DOIUrl":"10.1016/j.chemolab.2024.105189","url":null,"abstract":"<div><p>In modern complex industrial operations, timely fault detection is imperative. While statistical process monitoring is widely used in practice, conventional approaches are usually insensitive to incipient faults (IFs) whose magnitudes are not obvious. To this end, an innovative approach is presented for IF detection in dynamic processes. To begin with, canonical variate residuals (CVRs) are generated by using the canonical variate dissimilarity analysis (CVDA) algorithm. The next step involves calculating statistics for the CVRs and arranging a corresponding statistic matrix. Afterward, the Mahalanobis distance index is constructed for fault detection purpose. The main reasons that this developed approach possesses high sensitivity to IFs in dynamic processes lie in the utilization of CVDA and the idea of monitoring extracted statistics rather than original residuals. Finally, its effectiveness and merits are demonstrated via a numerical example and a benchmark process.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105189"},"PeriodicalIF":3.7,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141845889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali R. Jalalvand , Maziar Farshadnia , Faramarz Jalili , Cyrus Jalili
{"title":"A novel, intelligent and computer-assisted electrochemical sensor for extraction and simultaneous determination of patulin and citrinin in apple and pear fruit samples","authors":"Ali R. Jalalvand , Maziar Farshadnia , Faramarz Jalili , Cyrus Jalili","doi":"10.1016/j.chemolab.2024.105188","DOIUrl":"10.1016/j.chemolab.2024.105188","url":null,"abstract":"<div><p>In this work, a novel electrochemical sensor was fabricated for simultaneous determination of patulin (PT) and citrinin (CT) in apple and pear fruit samples. A glassy carbon electrode (GCE) was modified with graphene-multiwalled carbon nanotubes-ionic liquid (Gr-MWCNTs-IL) which was used as a platform to electrochemical synthesis of molecularly imprinted polymers (MIPs) by using PT and CT as templates, maleic acid as a functional monomer, and ethylene glycol dimethacrylate as a cross linker with the aim of preconcentration and simultaneous determination of the PT and CT. Experimental variables affecting fabrication of the structure of the sensor and hydrodynamic differential pulse voltammetric (HDPV) response of the sensor were optimized by a small central composite design and desirability function. After optimization, the HDPV responses of the sensor were calibrated by multivariate calibration methods in the ranges of 0.5–13 fM and 1.5–18 fM for PT and CT, respectively, with the help of PLS-1, RBF-PLS, rPLS, LS-SVM, and RBF-ANN with the aim of selecting the best algorithm to assist the sensor. Our results confirmed the best performance was observed from RBF-ANN which was used for the analysis of apple and pear fruit samples. Limit of detections of the sensor assisted by RBF-ANN for determination of PT and CT were 0.08 and 0.61 fM, respectively. Several commercial brands were analyzed by the use of sensor assisted by RBF-ANN and HPLC-UV, and the results confirmed performance of the sensor was admirable and comparable with the reference method with lower cost, faster response, and easier procedure which made it to be a reliable alternative method for simultaneous determination of PT and CT in real matrices.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105188"},"PeriodicalIF":3.7,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141779760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arnoud Jochemsen , Gry Alfredsen , Harald Martens , Ingunn Burud
{"title":"Exploring the use of extended multiplicative scattering correction for near infrared spectra of wood with fungal decay","authors":"Arnoud Jochemsen , Gry Alfredsen , Harald Martens , Ingunn Burud","doi":"10.1016/j.chemolab.2024.105187","DOIUrl":"10.1016/j.chemolab.2024.105187","url":null,"abstract":"<div><p>Extended Multiplicative Signal Correction (EMSC) is a multivariate linear modelling technique for multi-channel measurements that can identify and correct for different types of systematic variation patterns, known or unknown. It is typically used for pre-processing to separate light absorbance spectra, obtained by diffuse reflectance of intact samples, into three main sources of variation: additive variations due to chemical composition (≈Beer's law), mixed multiplicative and additive variations due to physical light scattering (≈Lambert's law) and more or less random measurement noise. The present work evaluates the use of EMSC to pre-process near infrared spectra obtained by hyperspectral imaging of Scots pine sapwood, inoculated with two different basidiomycete fungi and at various degradation stages. The spectral changes due to fungal decay and resulting mass loss are assessed by interpretation of the EMSC parameters and the partial least squares regression (PLSR) results. Including a cellulose (analyte) or bound water (interferent) spectral profile in the EMSC pre-processing model generally improves the predictive performance of the PLS modelling, but it can also make it worse. The inclusion of the additional polynomial baselines does not necessarily lead to a better separation of the physical and chemical effects present in the spectra. The estimated EMSC parameters provide insight into the differences in decay mechanisms. A detailed analysis of the EMSC results highlights advantages and disadvantages of using a complex pre-processing model.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105187"},"PeriodicalIF":3.7,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169743924001278/pdfft?md5=539eb3ac5e36684f422400bcc2d57271&pid=1-s2.0-S0169743924001278-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141779761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Empowerments of blood cancer therapeutics via molecular descriptors","authors":"K. Pattabiraman","doi":"10.1016/j.chemolab.2024.105180","DOIUrl":"10.1016/j.chemolab.2024.105180","url":null,"abstract":"<div><p>A disease caused by cellular alterations that is unrestrained cell growth and division is cancer. Many anticancer medications, including those used to treat blood, breast, and skin cancer, may have their physical, chemical, and biological features predicted. This paper presents novel distance-based topological indices (TIs) computed using the suggested KP-polynomial with blood cancer drugs. The objective of the QSPR investigation is to determine the mathematical correlation between the analyzed properties (such as Molar Volume, Refractive Index, etc.) and different descriptors associated with the molecular structure of the medications. A polynomial regression model is employed to assess the predictive capability of TIs. The results are represented using a correlation coefficient to establish the connection between the predicted and observed values of blood cancer drugs. This theoretical method could potentially enable chemists and health care professionals to anticipate the characteristics of blood cancer drugs without the need for actual experimental tests. This leads towards new opportunities to paved the way for drug discovery and the formation of efficient multicriteria decision making technique TOPSIS for ranking of said disease treatment drugs and physicochemical characteristics.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105180"},"PeriodicalIF":3.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141852368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenghui Feng , Hanli Jiang , Ruiqi Lin , Wanying Mu
{"title":"Moving window sparse partial least squares method and its application in spectral data","authors":"Zhenghui Feng , Hanli Jiang , Ruiqi Lin , Wanying Mu","doi":"10.1016/j.chemolab.2024.105178","DOIUrl":"10.1016/j.chemolab.2024.105178","url":null,"abstract":"<div><p>With the advancement of data science and technology, the complexity and diversity of data have increased. Challenges arise when dealing with a larger number of variables than the sample size or the presence of multicollinearity due to strong correlations among variables. In this paper, we propose a moving window sparse partial least squares method that combines the sliding interval technique with sparse partial least squares. By utilizing sliding interval partial least squares regression to identify the optimal interval and incorporating sparse partial least squares for variable selection, the proposed method offers innovations compared to traditional partial least squares (PLS) approaches. Monte Carlo simulations demonstrate its performance in variable selection and model prediction. We apply the method to seawater spectral data, predicting the concentration of chemical Oxygen demand. The results show that the method not only selects reasonable spectral wavelength intervals but also enhances predictive performance.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105178"},"PeriodicalIF":3.7,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141693158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aline Emmer Ferreira Furman , Alexandre de Fátima Cobre , Dile Pontarolo Stremel , Roberto Pontarolo
{"title":"A new and fast method for diabetes and dyslipidemia diagnosis using FTIR-MIR, spectroscopy and multivariate data analysis: A proof of concept","authors":"Aline Emmer Ferreira Furman , Alexandre de Fátima Cobre , Dile Pontarolo Stremel , Roberto Pontarolo","doi":"10.1016/j.chemolab.2024.105179","DOIUrl":"10.1016/j.chemolab.2024.105179","url":null,"abstract":"<div><p>Diabetes and dyslipidemia are well-established risk factors for cardiovascular disease, which is the primary cause of death both in Brazil and globally. Fourier-transform mid-infrared spectroscopy (FTIR-MIR) generates spectral fingerprints of biomolecules, allowing for correlation with metabolic changes, while remaining a rapid, non-invasive, and non-destructive method. The study provided a proof of concept for the effectiveness of FTIR-MIR in screening diabetes, pre-diabetes, hypercholesterolemia, hypertriglyceridemia, and mixed dyslipidemia in blood serum. After acquiring mid-infrared spectra of 60 human serum samples, both unsupervised and supervised analysis models were developed. Principal component analysis (PCA) was used for pattern recognition and to determine how closely related the samples were based on their spectral profiles. The results obtained by the supervised models showed a clear discriminative ability to distinguish both diabetic and dyslipidemic samples from healthy subjects by multivariate analysis performed on FTIR-MIR spectra. High accuracy rates of more than 90 % were achieved for diabetes and dyslipidemia diagnosis with PLS-DA. Dyslipidemia type discrimination could be attributed mainly to the amide I region [1720-1600 cm<sup>−1</sup>, (ν(C<img>O)] and altered lipid concentration in the 3000-2800 cm<sup>−1</sup> region, whereas the discrimination of diabetes and prediabetes was primarily due to the altered conformational protein in the Amides I [1720-1600 cm<sup>−1</sup>, ν(C<img>O)] and Amide II [1570-1480 cm<sup>−1</sup>, δ(N<img>H) + ν(CH)] range.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105179"},"PeriodicalIF":3.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141701142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Tian , Ming Li , Zhiyi Tan , Meng Lei , Lin Ke , Liang Zou
{"title":"Intelligent non-destructive measurement of coal moisture via microwave spectroscopy and chemometrics","authors":"Jun Tian , Ming Li , Zhiyi Tan , Meng Lei , Lin Ke , Liang Zou","doi":"10.1016/j.chemolab.2024.105175","DOIUrl":"10.1016/j.chemolab.2024.105175","url":null,"abstract":"<div><p>The rapid and non-destructive measurement of coal moisture content is essential in the coal industry for production, transportation and utilization purposes. Existing measurement methods have still drawbacks, such as being time-consuming, producing destructive samples and yielding unstable outcomes. To address these issues, this paper explored the utilization of broadband microwave spectrum for intelligent coal moisture measurement. A multi-type outliers detection method based on the Monte-Carlo cross-validation (MCCV) strategy was used to prevent masking effect of microwave spectra. In order to effectively extract microwave spectral features and establish correlations with coal moisture, a novel neural network model, UC-PLSR, is proposed by combining U-Net, Convolutional Block Attention Module (CBAM) and Partial Least Squares Regression (PLSR) algorithm. Furthermore, a design scheme/case of a microwave measurement device for coal moisture was presented, offering guidance for the development of rapid coal moisture measurement instruments or on-site measurement systems. Experimental results demonstrated that the proposed model outperformed traditional chemometrics methods, achieving superior prediction accuracy and generalization capability with <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> = 0.8756, MAE = 1.2523 and RMSE=1.6560.</p></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"252 ","pages":"Article 105175"},"PeriodicalIF":3.7,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141638995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}