Chemometrics and Intelligent Laboratory Systems最新文献

Adversarial Domain Adaptation Guided by Farthest Distance for open set electronic nose drift compensation

IF 3.8 2区化学

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-10-22 DOI: 10.1016/j.chemolab.2025.105554

Yong Pan , Chuandong Li , Jiang Xiong , Ziye Hou , Youbin Yao

{"title":"Adversarial Domain Adaptation Guided by Farthest Distance for open set electronic nose drift compensation","authors":"Yong Pan , Chuandong Li , Jiang Xiong , Ziye Hou , Youbin Yao","doi":"10.1016/j.chemolab.2025.105554","DOIUrl":"10.1016/j.chemolab.2025.105554","url":null,"abstract":"<div><div>With advancements in modern science and technology, electronic noses (ENs) have gained significant attention for their applications in environmental monitoring, food quality inspection, and medical equipment. ENs mimic biological olfactory systems to classify gases using arrays of sensors and pattern recognition models. However, gas sensor drift poses a major challenge, leading to performance degradation in EN systems. To address this, Domain Adaptation (DA) methods align source domain data with target domain drift data. While traditional DA methods assume identical class compositions in both domains, this is often unrealistic in practice, leading to suboptimal results. Open Set Domain Adaptation (OSDA) methods address unknown classes in the target domain, but they often focus too much on distinguishing unknown classes, neglecting accurate recognition of known classes. To overcome these limitations, we propose the Adversarial Domain Adaptation Guided by Farthest Distance (ADA-FDG), comprising two complementary modules: Farthest Distance Guide (FDG) and Confidence Normalized Adaptive Factor (CNAF). FDG adaptively builds a guide set that lies farthest from the source distribution in feature space, ensuring adversarial alignment learns to the edge region distribution. CNAF assigns a weight to each batch proportional to its classification confidence, preventing unknown-class samples from contaminating the ADA process. By integrating FDG and CNAF in an adversarial training framework, ADA-FDG achieves more precise alignment of source and target distributions while preserving clear separation between known and unknown classes. Extensive experiments on two benchmark datasets demonstrate that ADA-FDG consistently outperforms state-of-the-art closed and open set DA methods, delivering significant improvements in overall, known-class, and unknown-class accuracy.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105554"},"PeriodicalIF":3.8,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145358703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Chemometric modelling of anticancer drugs using CatBoost regression and graphical derivatives

IF 3.8 2区化学

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-10-17 DOI: 10.1016/j.chemolab.2025.105551

Yingxuan Huang , Muhammad Farhan Hanif , Eiman Maqsood , Mudassar Rehman

{"title":"Chemometric modelling of anticancer drugs using CatBoost regression and graphical derivatives","authors":"Yingxuan Huang , Muhammad Farhan Hanif , Eiman Maqsood , Mudassar Rehman","doi":"10.1016/j.chemolab.2025.105551","DOIUrl":"10.1016/j.chemolab.2025.105551","url":null,"abstract":"<div><div>In this work, a chemometric methodology based on graph topology descriptors and CatBoost regression is proposed for predicting the physicochemical properties of anticancer drugs. Molecular structures were encoded as graphs, and degree-based topological descriptors were derived to capture their complexity. These descriptors were used in the construction of regression models predicting boiling point, molar refractivity, and polarizability. The first statistical analysis with linear and cubic regression demonstrated that models of order higher than unity were able to take into account the non-linear dependence of descriptors vs. molecular properties. CatBoost regression model was used for improved predictability and better interpretability. This model exhibits a coefficient of determination <span><math><mrow><mo>(</mo><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></mrow></math></span> of 0.997 for the prediction of boiling point and superior performance across all the other two properties, with average absolute errors lower than 2%. Of importance, we identified several graph descriptors as important predictors, which confirmed their chemometric relevance. The method may contribute with useful information as a complementary method to current machine learning-based models used for prediction of drug properties in chemoinformatics or pharmaceutical drug development, it integrates chemical graph theory with intelligent reasoning and modeling for a more fault tolerant and generalized 1 solution to drug property prediction.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105551"},"PeriodicalIF":3.8,"publicationDate":"2025-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145358697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Robust soft sensor development based on Dirichlet process mixture of regression model for multimode processes 基于Dirichlet过程混合回归模型的多模过程鲁棒软传感器开发

IF 3.8 2区化学

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-10-11 DOI: 10.1016/j.chemolab.2025.105550

Changrui Xie, Xi Chen

{"title":"Robust soft sensor development based on Dirichlet process mixture of regression model for multimode processes","authors":"Changrui Xie, Xi Chen","doi":"10.1016/j.chemolab.2025.105550","DOIUrl":"10.1016/j.chemolab.2025.105550","url":null,"abstract":"<div><div>Industrial processes often exhibit multimode characteristics due to factors like load variations, equipment changes, and feedstock fluctuations. This paper introduces a Dirichlet Process-based Twofold-Robust Mixture Regression Model (DPR<sup>2</sup>MRM) for multimode processes. As a Bayesian nonparametric model, it automatically determines the number of mixture components from observed data using Dirichlet process mixture techniques, avoiding underfitting and overfitting. The model employs a Student's-<em>t</em> mixture model for input space learning, leveraging its long-tail properties for robust mode identification. For each mode, a regression model is built to capture the relationship between inputs and outputs, incorporating Student's-<em>t</em> noise to ensure robustness against output space outliers. The optimal posteriors of the model parameters are inferenced within a full Bayesian framework, and an analytical posterior predictive distribution is derived. The effectiveness of the DPR<sup>2</sup>MRM is demonstrated through a numerical example and two industrial applications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105550"},"PeriodicalIF":3.8,"publicationDate":"2025-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145325250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian optimization for interval selection in PLS models PLS模型中区间选择的贝叶斯优化

IF 3.8 2区化学

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-10-10 DOI: 10.1016/j.chemolab.2025.105541

Nicolás Hernández , Yoonsun Choi , Tom Fearn

引用次数: 0

High-accuracy leather species identification via Raman spectroscopy and attention-enhanced 1D-CNN 通过拉曼光谱和注意力增强的1D-CNN高精度皮革物种识别

IF 3.8 2区化学

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-10-09 DOI: 10.1016/j.chemolab.2025.105549

Zhen Li, Jiang Zhang

{"title":"High-accuracy leather species identification via Raman spectroscopy and attention-enhanced 1D-CNN","authors":"Zhen Li, Jiang Zhang","doi":"10.1016/j.chemolab.2025.105549","DOIUrl":"10.1016/j.chemolab.2025.105549","url":null,"abstract":"<div><div>Leather derived from different animal sources exhibits significant differences in both performance and value. Traditional leather identification methods suffer from subjectivity, inefficiency, and high costs, motivating the need for rapid, objective, and cost-effective alternatives. To achieve rapid and non-destructive classification of leather types, our study introduces a novel combination of Raman spectroscopy and a one-dimensional convolutional neural network (1D-CNN) enhanced with a self-attention mechanism to efficiently capture subtle spectral differences among leather types. A total of 1066 Raman spectra from cow, sheep, pig, and crocodile leathers were collected. Spectral data underwent smoothing, baseline correction, and normalization. Seven samples from each leather class were randomly assigned to the training set, while the remaining three samples per class were designated as an independent validation set. Data augmentation was performed by adding Gaussian noise and applying slight spectral shifts to simulate real-world variability, expanding the training set to 3,810 samples. The proposed 1D-CNN model incorporates a self-attention mechanism to extract key spectral features and is compared with machine learning models and 1D-CNN models that do not integrate attention mechanisms. Experimental results demonstrate that our method outperforms existing approaches. After incorporating the self-attention mechanism, the model maintained a high accuracy during cross-validation, while its average classification accuracy on the independent test set increased from 92.11 % to 96.28 %. This result demonstrates that the proposed approach achieves enhanced generalization performance under different data partitioning schemes. This efficient, non-destructive, and reliable method not only enables accurate leather species identification and luxury goods authentication, but also shows promise for broader material classification and quality control applications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105549"},"PeriodicalIF":3.8,"publicationDate":"2025-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145262452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discovery of new anti-HIV candidate molecules with an AI-based multi-stage system approach using molecular docking and ADME predictions 利用分子对接和ADME预测，利用基于ai的多阶段系统方法发现新的抗hiv候选分子

IF 3.8 2区化学

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-10-08 DOI: 10.1016/j.chemolab.2025.105543

Harun Uslu , Bihter Das , Huseyin Alperen Dagdogen , Yunus Santur , Seval Yılmaz , Ibrahim Turkoglu , Resul Das

{"title":"Discovery of new anti-HIV candidate molecules with an AI-based multi-stage system approach using molecular docking and ADME predictions","authors":"Harun Uslu , Bihter Das , Huseyin Alperen Dagdogen , Yunus Santur , Seval Yılmaz , Ibrahim Turkoglu , Resul Das","doi":"10.1016/j.chemolab.2025.105543","DOIUrl":"10.1016/j.chemolab.2025.105543","url":null,"abstract":"<div><div>The discovery of novel therapeutic molecules against the Human Immunodeficiency Virus (HIV) remains a critical research priority due to the persistent global impact of the disease. Traditional drug discovery processes are often time-consuming, costly, and limited in predictive capacity at early stages. In this study, we propose a three-stage AI-supported framework that integrates deep learning and molecular docking to accelerate candidate identification. First, a customized Autoencoder–Long Short-Term Memory (LSTM) model was employed to generate novel molecular structures consistent with key pharmacokinetic rules. Second, a Geometric Deep Learning (GDL) model was designed to evaluate interactions with major HIV-1 targets, including integrase, protease, and reverse transcriptase. Finally, <em>In silico</em> docking simulations assessed binding affinities and inhibition constants. The framework generated molecules that not only complied with pharmacokinetic and drug-likeness criteria (e.g., QED, ADME, SAScore) but also demonstrated favorable binding properties, particularly towards HIV-1 reverse transcriptase. These findings highlight the potential of the proposed approach to complement early-stage drug discovery and to contribute to the design of promising lead compounds for further experimental validation.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105543"},"PeriodicalIF":3.8,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145262448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Monte Carlo peaks: Simulated datasets to benchmark machine learning algorithms for clinical spectroscopy 蒙特卡罗峰：模拟数据集，以基准机器学习算法为临床光谱

IF 3.8 2区化学

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-10-08 DOI: 10.1016/j.chemolab.2025.105548

Jaume Béjar-Grimalt , Ángel Sánchez-Illana , Guillermo Quintás , Hugh J. Byrne , David Pérez-Guaita

{"title":"Monte Carlo peaks: Simulated datasets to benchmark machine learning algorithms for clinical spectroscopy","authors":"Jaume Béjar-Grimalt , Ángel Sánchez-Illana , Guillermo Quintás , Hugh J. Byrne , David Pérez-Guaita","doi":"10.1016/j.chemolab.2025.105548","DOIUrl":"10.1016/j.chemolab.2025.105548","url":null,"abstract":"<div><div>Infrared and Raman spectroscopy hold great promise for clinical applications. However, the inherent complexity of the associated spectral data necessitates the use of advanced machine learning techniques which, while powerful in extracting biological information, often operate as <em>black-box</em> models. Combined with the absence of standardized datasets, this hinders model optimization, interpretability, and the systematic benchmarking of the growing number of newly developed machine learning methods. To address this, we propose a simulation-based framework for generating fully synthetic spectral datasets using Monte Carlo approaches for benchmarking. The artificial datasets mimic a wide range of realistic scenarios, including overlapping spectral markers and non-discriminant features and can be adjusted to simulate the effect of different parameters, such as instrumental noise, number of interferences, and sample size. These spectra are simulated through the generation of Lorentzian bands across the mid-infrared range, without specific reference to experimental data or chemical structures. We used the proposed methodology to compare different spectral marker identification protocols in a partial least squares discriminant analysis (PLS-DA), showing that the orthogonal PLS-DA (OPLS-DA) approach, when combined with marker selection based on VIP scores or the regression vector, yielded higher sensitivity, specificity, and interpretability than standard PLS-DA using the same selection criteria. This framework was further used to benchmark the classification capabilities of commonly employed machine learning algorithms, incorporating both linear and non-linear markers reflective of compositional variations across the target classes. Key findings were validated using real infrared spectra from human blood serum and saliva collected in the frame of a clinical study. Overall, the proposed approach provides a versatile sandbox environment for the systematic evaluation of data analysis strategies in vibrational spectroscopy, that can help experimentalists to better interpret spectral markers or data analysts focused on benchmarking and validating new algorithms.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105548"},"PeriodicalIF":3.8,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145325251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhanced PLS subspace-based calibration transfer method for multiple spectrometers using small standardization sample sets 改进的基于PLS子空间的多光谱仪校准转移方法，使用小型标准化样品集

IF 3.8 2区化学

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-10-03 DOI: 10.1016/j.chemolab.2025.105545

Bin Li , Eizo Taira , Tetsuya Inagaki

{"title":"Enhanced PLS subspace-based calibration transfer method for multiple spectrometers using small standardization sample sets","authors":"Bin Li , Eizo Taira , Tetsuya Inagaki","doi":"10.1016/j.chemolab.2025.105545","DOIUrl":"10.1016/j.chemolab.2025.105545","url":null,"abstract":"<div><div>Near-infrared spectroscopy (NIRS) calibration transfer faces significant challenges when deploying models across multiple instruments from different manufacturers, particularly because the inherently low molar absorptivity makes spectral data highly sensitive to minor variations in optical setup. This study presents two enhanced calibration transfer methods (ICTWM1 and ICTWM2) operating within the PLS latent variable space, utilizing dimensionality reduction to preserve analytically relevant variance while reducing noise interference. ICTWM1 employs spectral space transformation (SST) to correct PLS component scores between different instruments, while ICTWM2 selectively corrects the regression coefficients of the principal components with the highest variability.</div><div>The methods were validated using wheat protein analysis (7 secondary instruments from manufacturers A and B) and industrial sugarcane Brix determination (8 secondary instruments across geographically distributed facilities). ICTWM1 demonstrated superior performance, achieving 79.3 % relative performance compared to the primary instrument model using only 10 standardization samples on the wheat dataset, with improved cross-instrument consistency (standard deviations of 6.9 %) compared to traditional methods (>15 %). The method exhibited no manufacturer-dependent performance bias and maintained consistent performance across sample sizes ranging from 10 to 110. Under severely constrained sugarcane dataset with only 5 training samples, both ICTWM1 and ICTWM2 achieved good performance with mean RMSEP values of 0.14°Bx and 0.15°Bx, respectively, outperforming traditional calibration transfer methods.</div><div>ICTWM1 demonstrates improved sample efficiency and cross-manufacturer robustness through optimized transformation within PLS subspace. These characteristics make it a practical method for industrial NIRS applications requiring reliable calibration transfer with minimal standardization samples.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105545"},"PeriodicalIF":3.8,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145262446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An automated preprocessing framework for near infrared spectroscopic data 近红外光谱数据的自动预处理框架

IF 3.8 2区化学

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-09-28 DOI: 10.1016/j.chemolab.2025.105542

Xiaojing Chen , Zhonghao Xie , Roma Tauler , Yong He , Pengcheng Nie , Yankun Peng , Liang Shu , Shujat Ali , Guangzao Huang , Wen Shi , Xi Chen , Leiming Yuan

{"title":"An automated preprocessing framework for near infrared spectroscopic data","authors":"Xiaojing Chen , Zhonghao Xie , Roma Tauler , Yong He , Pengcheng Nie , Yankun Peng , Liang Shu , Shujat Ali , Guangzao Huang , Wen Shi , Xi Chen , Leiming Yuan","doi":"10.1016/j.chemolab.2025.105542","DOIUrl":"10.1016/j.chemolab.2025.105542","url":null,"abstract":"<div><div>Preprocessing plays a vital role in the analysis of Near-infrared spectroscopy (NIRS) data as it aims to remove unintended artifacts. This process involves a series of steps, each with a specific focus on a particular artifact. However, due to the diverse range of NIRS applications, selecting the optimal combination of preprocessing methods remains a challenge. To address this issue, we propose an automated preprocessing framework that can quickly identify the optimal preprocessing strategy. The framework initially constructs a workflow consisting of multiple types of preprocessing methods. Then, a genetic algorithm (GA) technique is used to optimize the best pipeline, avoiding exhaustive searches. In addition, we impose a penalty for the loss function of the GA process to obtain a parsimonious solution. Results on three real-world datasets demonstrate that our approach outperforms several state-of-the-art ensemble preprocessing methods in terms of prediction error. Compared to the raw data, the optimal preprocessing method can improve model performance by at least 48%. Furthermore, our framework enables the identification of the most effective preprocessing methods included in the best pipeline. The source code for our approach is available on GitHub and can be easily integrated with other existing preprocessing techniques.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"267 ","pages":"Article 105542"},"PeriodicalIF":3.8,"publicationDate":"2025-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145217363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Conformalized outlier detection for mass spectrometry data 质谱数据的规范化离群值检测

IF 3.8 2区化学

Chemometrics and Intelligent Laboratory Systems Pub Date : 2025-09-23 DOI: 10.1016/j.chemolab.2025.105539

Yangha Chung , Johan Lim , Xinlei Wang , Soohyun Ahn

引用次数: 0