{"title":"Stacking density estimation and its oversampling method for continuously imbalanced data in chemometrics","authors":"Xin-Ru Zhao , Lun-Zhao Yi , Guang-Hui Fu","doi":"10.1016/j.chemolab.2025.105366","DOIUrl":"10.1016/j.chemolab.2025.105366","url":null,"abstract":"<div><div>Continuously imbalanced data means that the target variable is continuous and its distribution is uneven. This kind of data is widespread in many practical application areas. However, methods to effectively handle continuously imbalanced data have been relatively scarce, and there is an urgent need to establish corresponding imbalance regression methods to enhance the capability of handling continuously imbalanced data. Firstly, we propose a Stacking-based density estimation (SDE) method to solve the density estimation problem of continuously imbalanced target variables. SDE links density estimation with the Ensemble learning algorithm called Stacking, and its core concept is the “fusion of multiple perspectives for accurate capture”. Performing SDE enhances the model’s understanding of complex data structures and makes it more sensitive and accurate in identifying rare values. Subsequently, we investigate an SDE-based oversampling technique (SDE-OS). SDE-OS uses SDE to synthesize new rare instances in the rare-value region, achieving fine-tuned customization of rare-value additions. In a series of numerical experiments, SDE has been estimated more accurately than the kernel density estimation method on ANLL. SDE-OS outperforms conventional sampling methods such as SMOGN and SMOTER in various metrics. Therefore, the proposed SDE and SDE-OS are highly competitive and effective tools for addressing the imbalanced regression problem.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105366"},"PeriodicalIF":3.7,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143628369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristian Rojas , Mónica Abril-González , Davide Ballabio , Fernando García
{"title":"ChemTastesPredictor: An ensemble of machine learning classifiers to predict the taste of molecular tastants","authors":"Cristian Rojas , Mónica Abril-González , Davide Ballabio , Fernando García","doi":"10.1016/j.chemolab.2025.105380","DOIUrl":"10.1016/j.chemolab.2025.105380","url":null,"abstract":"<div><div>The sense of taste plays a critical role in food science, since it directly impacts food consumption, human nutrition, and overall health. Computational models that predict the taste of molecular tastants based on their chemical structure and machine learning classifiers serve as powerful tools in the advancing field of foodinformatics. This study describes the development of <em>ChemTastesPredictor</em> designed to predict the taste of 4075 molecular tastants included in the extended version of <em>ChemTastesDB</em> (<span><span>https://zenodo.org/records/14963136</span><svg><path></path></svg></span>). To the best of our knowledge, this represents the largest dataset with a broad-based chemical space used to calibrate machine learning (ML) models for taste prediction based on molecular descriptors and fingerprints. For validation, datasets were randomly split into training and test sets in a 75:25 ratio, ensuring balanced class distributions. In binary classification tasks, the Random Forest classifier demonstrated the highest predictive performance for sweet/bitter (<em>NER</em> = 0.928 and <em>F-score</em> = 0.927) and bitter/non-bitter (<em>NER</em> = 0.902 and <em>F-score</em> = 0.903) classification. Adaptive Boosting excelled in the prediction of sweet/non-sweet (<em>NER</em> = 0.861 and <em>F-score</em> = 0.862). The <em>N</em>-Nearest Neighbors classifier emerged as the optimal classifier for umami/non-umami (<em>NER</em> = 0.957 and <em>F-score</em> = 0.860) and sweet/bitter/umami (<em>NER</em> = 0.870 and <em>F-score</em> = 0.843). These models may be useful in the development and analysis of new chemical tastants.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105380"},"PeriodicalIF":3.7,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143643898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xun Su , Yanmei Zhang , Yiyi Zhang , Jiefeng Liu , Min Xu , Pengfei Jia
{"title":"An enhanced multilayer Res-Informer with Savitzky-Golay filter for predicting mixed CO and NOx emissions in gas turbines","authors":"Xun Su , Yanmei Zhang , Yiyi Zhang , Jiefeng Liu , Min Xu , Pengfei Jia","doi":"10.1016/j.chemolab.2025.105379","DOIUrl":"10.1016/j.chemolab.2025.105379","url":null,"abstract":"<div><div>Gas turbines emit large amounts of carbon monoxide (CO) and nitrogen oxides (NO<sub>x</sub>) when working, and the emission of CO and NO<sub>x</sub> poses serious harm to human health and environment. Therefore, accurately predicting CO and NO<sub>x</sub> emissions from gas turbines is of great significance. Traditional machine learning algorithms have significant drawbacks in handling long time series data. They typically require complex feature engineering to manage time dependencies, the modeling process is cumbersome and time-consuming, and they are limited in capturing nonlinear features and handling high-dimensional data, as well as effectively dealing with noise and non-stationarity in data. To address these issues, this study proposes an enhanced SGM-ResInformer. This method combines the characteristics of Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Savitzky-Golay filter, multilayer residual network (M-ResNet), and an improved Informer. Data denoising is performed using DBSCAN and Savitzky-Golay filters, M-ResNet enhances the extraction of complex features, better capturing nonlinear relationships in the data, and in the Informer, the original simple MaxPool1d layer in the self-attention distillation layer is replaced with a learnable convolutional layer for attention distillation operations. Experimental results show that compared to the traditional Informer model, the mean square error (MSE) of SGM-ResInformer is reduced by 44.26 %, indicating a significant performance improvement. Compared with other advanced algorithms like Autoformer, SG-informer, Transformer, and LSTM, SGM-ResInformer also shows varying degrees of improvement. Overall, the model excels in improving prediction accuracy, stability, adaptability, and generalization ability, making it particularly suitable for multi-step prediction tasks of complex time series data. These advantages make the model significantly outstanding in the task of predicting emissions from gas turbines.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105379"},"PeriodicalIF":3.7,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143637473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correlations between the constituent molecules, crystal structures, and dielectric constants in organic crystals","authors":"Yuya Shiraki, Hiromasa Kaneko","doi":"10.1016/j.chemolab.2025.105376","DOIUrl":"10.1016/j.chemolab.2025.105376","url":null,"abstract":"<div><div>Organic crystals are crystals composed of molecules of organic compounds. Materials made from organic crystals are used in devices, such as capacitors and ferroelectric memories. It is desirable to develop new materials with improved physical properties such as the dielectric constant (DC). However, the relationship between the constituent molecule (CM), crystal structure (CS), and DC of the organic crystals is not clearly understood. In this study, we investigated the relationship between CM, CS, and DC with existing data and machine learning. Using regression analysis, we could construct machine learning models between CM and DC, between CS and DC, and between CM and CS, and could predict DC from CM, DC from CS, and CS from CM using the constructed models.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105376"},"PeriodicalIF":3.7,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143610625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing adsorption efficiency: A novel application of SVM_Boosting_IGWO for methylene blue dye removal using low-cost fruit peels adsorbents","authors":"Nasma Bouchelkia , Hichem Tahraoui , Kheira Benazouz , Amal Mameri , Reguia Boudraa , Hamza Moussa , Nadia Hamri , Ryma Merdoud , Hayet Belkacemi , Abdelhalim Zoukel , Abdeltif Amrane , Mohammed Kebir , Lotfi Mouni","doi":"10.1016/j.chemolab.2025.105377","DOIUrl":"10.1016/j.chemolab.2025.105377","url":null,"abstract":"<div><div>In this work, the potential for employing orange peels (OP) and potato peels (PP) as biosorbents to remove the methylene blue dye (MB) from aqueous solutions is studied. Several physicochemical methods, such as FTIR, SEM-EDX, X-ray diffraction, and pH point of zero charge measurement, were used to analyze the adsorbents. FTIR analysis revealed changes in peak intensities after dye adsorption. SEM analysis confirmed the presence of starch in the PP adsorbent, while no apparent pore structure was observed in the OP adsorbent. EDX analysis showed that carbon and oxygen were the main components on the surfaces of OP and PP. X-ray diffraction patterns indicated that both adsorbents were amorphous materials. The impact of different factors, including adsorbent dosage, contact time, temperature, initial dye concentration, pH and particle size, on the biosorption process was studied. Kinetic studies revealed that equilibrium was reached within a few minutes of contact, and the MB removal followed the pseudo-second-order model.Furthermore, a novel predictive model combining Support Vector Machine (SVM) with Boosting and the Improved Grey Wolf Optimizer (IGWO) algorithm was developed. The SVM-IGWO-Boosting model exhibited excellent performance in predicting methylene blue adsorption, demonstrating perfect correlation and low prediction error. The IGWO optimization approach effectively optimized the input parameters for the adsorbents, resulting in excellent agreement between experimental and predicted values. Moreover, OP showed higher efficiency in removing methylene blue compared to PP, with a maximum capacity of 111.75 mg/g for OP and 96.67 mg/g for PP using IGWO. The use of orange peels and potato peels as agricultural waste for methylene blue removal offers an efficient and sustainable solution. The SVM-IGWO-Boosting predictive model, in conjunction with the IGWO optimization approach, provides a promising tool for predicting and optimizing the adsorption efficiency of MB adsorbents. These findings present valuable prospects for real-world applications requiring accurate and reliable predictions.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105377"},"PeriodicalIF":3.7,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143621188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sojeong Bae , Ku Kang , Young Kyun Kim , Yoon Jeong Jang , Doo-Hee Lee
{"title":"Field-deployable real-time AI System for chemical warfare agent detection using YOLOv8 and colorimetric sensors","authors":"Sojeong Bae , Ku Kang , Young Kyun Kim , Yoon Jeong Jang , Doo-Hee Lee","doi":"10.1016/j.chemolab.2025.105365","DOIUrl":"10.1016/j.chemolab.2025.105365","url":null,"abstract":"<div><div>Chemical warfare agents (CWAs) pose serious risks, requiring rapid, accurate detection. This study presents a real-time, lightweight AI system using YOLOv8 and colorimetric sensors, designed for field deployment. A dataset of 1,340 images captured under varying conditions enhances robustness. The model achieves 91.3% [email protected] and 10.4 ms/frame inference time on portable hardware. This system bridges the gap between laboratory methods and scalable field detection, ensuring efficient, on-site CWA identification for military, emergency response, and public health applications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105365"},"PeriodicalIF":3.7,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos de la Calle-Arroyo , Samantha Leorato , Licesio J. Rodríguez-Aragón , Chiara Tommasi
{"title":"Augmented designs to choose between constant absolute and relative errors and to estimate model parameters","authors":"Carlos de la Calle-Arroyo , Samantha Leorato , Licesio J. Rodríguez-Aragón , Chiara Tommasi","doi":"10.1016/j.chemolab.2025.105362","DOIUrl":"10.1016/j.chemolab.2025.105362","url":null,"abstract":"<div><div>In experimental sciences such as chemistry, the measurement error may be homoscedastic or heteroscedastic. The data should be collected with the goal of identifying the right error-variance structure, as an incorrectly specified model would lead to wrong conclusions. A design criterion that reflects this goal is KL-optimality. Frequently, however, KL-optimum designs are wholly inefficient for other inferential purposes, such as precise estimation. In this case, the addition of some experimental points might be convenient. This work focuses on the enrichment of a design through the inclusion of some additional support points, with the goal of guaranteeing a minimum KL-efficiency to be able to optimally choose between different variance specifications. This strategy is also useful for modifying a design that is already available, for instance a D-optimal design, to manage the problem of correct error-variance specification.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105362"},"PeriodicalIF":3.7,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143637472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingwei Jia , Qiao Liu , Lingwei Jiang , Yi Liu , Zengliang Gao , Tao Chen
{"title":"Just-in-time process soft sensor with spatiotemporal graph decoupled learning","authors":"Mingwei Jia , Qiao Liu , Lingwei Jiang , Yi Liu , Zengliang Gao , Tao Chen","doi":"10.1016/j.chemolab.2025.105367","DOIUrl":"10.1016/j.chemolab.2025.105367","url":null,"abstract":"<div><div>Deep learning-based just-in-time soft sensors effectively handle the strong nonlinearity of complex process industry, but their implementation faces significant challenges in interpretability and time cost. Hence, a just-in-time soft sensor based on spatiotemporal graph decoupling is proposed. To decrease time cost, it employs a global-local modeling strategy: pre-training on all historical data to build a global model, and fine-tuning with relevant samples to deliver a local model. To enhance interpretability, couplings that reflect how variables interact with each other in spatiotemporal dimensions are constructed, conforming to prior knowledge, to guide the graph neural network as a global model during pre-training. The global model decouples variables to quantify their influence as intrinsic information, enabling a clearer understanding of how each variable contributes to the prediction. Following the intrinsic information, relevant samples are then selected with the preset relevance metric to fine-tune the global model. Finally, two industrial cases demonstrate this model's low runtime, effectiveness, and physical consistency from the perspectives of underlying physics.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"261 ","pages":"Article 105367"},"PeriodicalIF":3.7,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143579500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zihang Wang , Shuai Li , Xiaofeng Zhou , Shijie Zhu
{"title":"An iterative conditional variable selection method for constraint-based time series causal discovery","authors":"Zihang Wang , Shuai Li , Xiaofeng Zhou , Shijie Zhu","doi":"10.1016/j.chemolab.2025.105361","DOIUrl":"10.1016/j.chemolab.2025.105361","url":null,"abstract":"<div><div>Time series causal discovery aims to identify cause-effect relationships among variables from time series data, providing valuable insights into complex real-world scenarios. However, existing constraint-based causal discovery methods face challenges such as limited detection power, stemming from issues like dimensionality explosion and uncertainty caused by indirect paths. To address these problems, we propose a novel iterative conditional variable selection method designed for lagged, linear, and nonlinear causal discovery in time series. (1) Firstly, we block indirect information while minimizing the dimensionality of the conditioning set. Specifically, our method selects the parent set of each target variable as the conditioning set, which includes only those variables involved in the indirect path. (2) Then, we refine the conditioning set by selecting a subset of the parent set for each target variable to focus on indirect causal relationships. (3) Finally, the iterative application of steps (1) and (2) progressively corrects the indirect paths, leading to a significant improvement in detection power. Experimental results on synthetic and public datasets, as well as for varying time lags, node counts, and a chemical fault diagnosis case, demonstrate that our method outperforms state-of-the-art (SOTA) approaches.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"260 ","pages":"Article 105361"},"PeriodicalIF":3.7,"publicationDate":"2025-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143520783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanan Zhang , Gaowei Yan , Shuyi Xiao , Fang Wang , Guanjia Zhao , Suxia Ma
{"title":"Mechanism- and data-driven based dynamic hybrid modeling for multi-condition processes","authors":"Yanan Zhang , Gaowei Yan , Shuyi Xiao , Fang Wang , Guanjia Zhao , Suxia Ma","doi":"10.1016/j.chemolab.2025.105353","DOIUrl":"10.1016/j.chemolab.2025.105353","url":null,"abstract":"<div><div>In process industries, the complexity and variability of working conditions make it challenging to accurately measure product quality. While data-driven models have developed rapidly, they often overlook the underlying physical or chemical mechanisms. To address this, we propose a hybrid modeling approach that combines mechanism- and data-driven methods. Historical and current working condition data are processed through a hidden layer to extract features. The partial differential equation is discretized and approximated using the forward Euler method to derive mechanism-based quality variable values. These values are then combined with real data through a weighted mix to create a new label for dynamic regression. Additionally, a domain adaptation regularization term is introduced to align the distributions of different working conditions. Through analyses of three process industry datasets, we demonstrate that this method can predict unmeasurable variables with reasonable accuracy and exhibits stronger generalization ability compared to pure data-driven models.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"260 ","pages":"Article 105353"},"PeriodicalIF":3.7,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143507522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}