Journal of Chemometrics最新文献_第5页

Data Quality: Importance of the ‘before analysis’ domain (Theory of Sampling, TOS) 数据质量：“分析前”域的重要性（抽样理论，TOS）

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-04-06 DOI: 10.1002/cem.70025

{"title":"Data Quality: Importance of the ‘before analysis’ domain (Theory of Sampling, TOS)","authors":"","doi":"10.1002/cem.70025","DOIUrl":"10.1002/cem.70025","url":null,"abstract":"Data analysts/chemometricians are part of a scientific collegium covering three distinct domains: i) sampling – ii) analysis – iii) data modelling, which are collectively influencing ‘data quality’. There is much more to data quality than analytical uncertainty. There are many situations where analysis is to be made of heterogeneous materials/batches/lots/flowing streams, which need to be sampled appropriately before analysis, following an often long and complex pathway ‘from-lot-to-aliquot’. In most cases, sampling and sub-sampling will dominate the total Measurement Uncertainty budget (MUtotal). Left-out MUsampling contributions may easily overwhelm the Total Analytical Error (TAE) uncertainty by factors 5, 10, 25 or higher as a function of the specific heterogeneity characteristics of the materials and systems targeted, and of the sampling procedure used (grab vs. composite sampling). Focus is here on the consequences of unwittingly ignoring the uncertainties originating in these domains, which e.g. will influence adversely on bilinear component directions (reducing model accuracy) as well as RMSE estimates reflecting precision (analyte concentration prediction, classification, time series prediction) and along the way will also clear up an evergreen mistake: contrary to many beliefs, ‘more data’ will not automatically reduce the magnitude of an unsatisfactory performance RMSE. It is shown how the Theory of Sampling (TOS) is the only guarantor of representative sampling in the critical ‘before analysis’ domain. This article introduces the essential minimum TOS competence which must be mastered by stakeholders from all three domains. The conceptual elements in the TOS system can be visualised as a graphic overview:Kim H. Esbensen has been professor at three universities (National Geological Survey of Denmark and Greenland (2010–2015), Aalborg University, Denmark (2001–2010), Telemark Institute of Technology, Norway (1990–2000) and professeur associé, Université du Québec à Chicoutimi before switching to a quest as an independent consultant in 2015. He is a member of several scientific societies and has published widely across several scientific fields. He is the author of a widely used textbook in Multivariate Data Analysis (chemometrics), and in 2020 published: “Introduction to the Theory and Practice of Sampling”. He was chairman of the taskforce responsible for the world's first horizontal (matrix-independent) sampling standard DS 3077:2024 - Esbensen is the founding editor of: “Sampling Science and Technology (SST)” - https://www.sst-magazine.info/issues/ He can be reached at his homepage https://kheconsult.com/","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143787231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data Quality: Importance of the ‘Before Analysis’ Domain [Theory of Sampling (TOS)] 数据质量：“前分析”域的重要性[抽样理论（TOS）]

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-04-06 DOI: 10.1002/cem.70021

Kim H. Esbensen

{"title":"Data Quality: Importance of the ‘Before Analysis’ Domain [Theory of Sampling (TOS)]","authors":"Kim H. Esbensen","doi":"10.1002/cem.70021","DOIUrl":"10.1002/cem.70021","url":null,"abstract":"Data Quality: what is it, where does it originate, how does it influence data modelling, what can chemometricians do about it? The ‘before analysis’ domain is prone to sampling errors resulting in uncertainties influencing the quality of both analysis and data analysis/data modelling. Nonrepresentative sampling of heterogeneous materials, batches, lots and process streams ‘before analysis’ contribute significantly to the total measurement uncertainty, MUtotal = MUsampling + MUanalysis. The total sampling error (TSE) can dominate over the total analytical error (TAE) by factors ranging 5, 10 or higher, depending on the degree of material heterogeneity encountered and the specific sampling procedure employed to produce the final analytical aliquot, which is the only material actually analysed. The analytical aliquot is the physical manifestation of transgressing the boundary from the before analysis (sampling) domain to the domain of analysis. It is only possible to guarantee representativity of the analytical aliquot, and thus of the analytical results with respect to the original target batch/lot/process stream, by invoking the necessary sampling domain competence stipulated by theory of sampling (TOS). Primary sampling is the most important stage in the full lot-to-analysis pathway, quantitatively dominating MUtotal (but subsequent subsampling stages can also be significant). If the sources of adverse sampling error effects have not been eliminated, the sampling process is biased and MUtotal will be unnecessarily inflated. TOS offers ways and means to deal actively with a potential sampling bias (which is fundamentally different from the analytical bias). Overlooking, or deliberately ignoring dealing appropriately with sampling effects constitutes a lack of due diligence, which has critical bearings on the QC/QA demands on both analysis and data analysis/modelling. This article presents all uncertainty contributions in the lot-to-analysis-to-data modelling pathway, which must be identified and managed, eliminated or maximally reduced, to be able to document a fully minimised MUtotal. Data analysts/chemometricians are part of a scientific collegium covering all three domains: sampling—analysis—data modelling, which are collectively responsible for ‘data quality’. This comprehensive scope has serious implications for the current PAT paradigm, the foundation of which turns out to need significant reform regarding a key process sampling aspect regardless of whether physical samples, or PAT sensor technology spectra, are extracted/acquired. This article introduces the essential minimum TOS competence that must be mastered by stakeholders from all three domains.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70021","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143787233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Expandable Diffusion Map–Based Weighted k-Nearest Neighbor Technique for Multimode Batch Process Monitoring 基于可扩展扩散图的加权 k 近邻技术用于多模式批量流程监控

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-04-05 DOI: 10.1002/cem.70020

Liwei Feng, Yifei Wu, Shaofeng Guo, Yu Xing, Yuan Li

{"title":"Expandable Diffusion Map–Based Weighted k-Nearest Neighbor Technique for Multimode Batch Process Monitoring","authors":"Liwei Feng, Yifei Wu, Shaofeng Guo, Yu Xing, Yuan Li","doi":"10.1002/cem.70020","DOIUrl":"10.1002/cem.70020","url":null,"abstract":"<div>\u0000 \u0000 The diffusion map–based k-nearest neighbor (DM-kNN) rule faces two challenges in multimode batch process monitoring. Firstly, the DM method encounters difficulties in projecting new samples. The training samples are repeatedly feature extracted, resulting in a time-consuming process. Faulty samples may be merged into normal samples and modeled together, which does not meet the requirements for fault detection. Secondly, DM-kNN has poor monitoring performance for multimode processes with significant variance differences. This paper proposes a technique called the expandable DM–based weighted k-nearest neighbor (EDM-WkNN) to solve these two issues. The expandable DM constructs a local projection matrix to attain the projecting of new samples. The effect of mode variance differences is eliminated by introducing weighted distances in statistic to overcome the difficulties caused by variance differences. We compare EDM-WkNN with classical fault detection methods through numerical examples and the fed-batch fermentation penicillin (FBFP) process. Our experiments confirm that the EDM-WkNN method effectively monitors faults in multimode batch processes.\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143778291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Smart Monitoring Solutions for Real-Time Water pH Regulation in Aquatic Ecotoxicology 水生生态毒理学中实时水pH调节的智能监测解决方案

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-04-03 DOI: 10.1002/cem.70024

Usman Ibrahim, Nasir Abbas, Muhammad Riaz, Tahir Mahmood

{"title":"Smart Monitoring Solutions for Real-Time Water pH Regulation in Aquatic Ecotoxicology","authors":"Usman Ibrahim, Nasir Abbas, Muhammad Riaz, Tahir Mahmood","doi":"10.1002/cem.70024","DOIUrl":"10.1002/cem.70024","url":null,"abstract":"<div>\u0000 \u0000 This study designs a statistical process control tool that effectively detects small and moderate shifts in process parameters, to address challenges in quality monitoring. The proposed control chart employs advanced statistical detection techniques to enhance sensitivity while reducing false alarms, thus improving detection performance in various applications. This methodology is applied in a real-life context within an aquatic ecotoxicology laboratory, where daily monitoring of water pH levels is essential for safeguarding the health of sensitive aquatic organisms, such as mysids. The laboratory environment is meticulously controlled to simulate natural conditions, and our application of the proposed control chart ensures that any deviations from the optimal pH level are detected promptly, thereby maintaining water quality and supporting the reliability of experimental outcomes. The paper comprehensively evaluates the performance of the proposed control chart in both zero-state and steady-state conditions, offering valuable insights for practitioners in the field. We present empirical evidence demonstrating that the proposed control chart significantly outperforms traditional control charts, including Shewhart, CUSUM, and EWMA, particularly in detecting small to moderate shifts in water pH levels. Furthermore, we provide optimal parameter settings tailored for specific monitoring scenarios, enhancing the applicability of proposed control chart for quality control in laboratory environments.\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143770223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Being Aware of Data Leakage and Cross-Validation Scaling in Chemometric Model Validation 化学计量学模型验证中的数据泄漏和交叉验证尺度问题

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-04-01 DOI: 10.1002/cem.70026

Péter Király, Gergely Tóth

{"title":"Being Aware of Data Leakage and Cross-Validation Scaling in Chemometric Model Validation","authors":"Péter Király, Gergely Tóth","doi":"10.1002/cem.70026","DOIUrl":"10.1002/cem.70026","url":null,"abstract":"Chemometrics is one of the most elaborated data science fields. It was pioneering and still as is in the use of novel machine learning methods in several decades. The literature of chemometric modeling is enormous; there are several guidance, software, and other descriptions on how to perform careful analysis. On the other hand, the literature is often contradictory and inconsistent. There are many studies, where results on specific datasets are generalized without justification, and later, the generalized idea is cited without the original limits. In some cases, the difference in the nomenclature of methods causes misinterpretations. As at every field of science, there are also some preferences in the methods which bases on the strength of research groups without flexible and real scientific approach on the selection of the possibilities. There is also some inconsistency between the practical approach of chemometrics and the theoretical statistical theories, where often unrealistic assumptions and limits are studied.The widely elaborated knowhow of chemometrics brings some rigidity to the field. There are some trends in data science to those ones chemometrics adapts slowly. An example is the exclusive thinking within the bias-variance trade-off model building [1] instead of using models in the double descent region for large datasets [2-4]. Another problematic question is data leakage. Chemometric models are built and often validated on data sets suffering data leakage up to now.In our investigations, we met cases, where the huge literature background provided large inertia in the correction of misinterpretations. In 2021 we found, that leave-one-out and leave-many-out cross-validation (LMO-CV) parameters can be scaled to each other [5]. Furthermore, we showed that the two ways have around the same uncertainty in multiple linear regression (MLR) calculations [6]. Therefore, the choice among these methods should be the computation practice instead of preconceptions. We obtained some formal and informal criticism about omitting results of some well cited studies.In this article, we present some examples to enhance rethinking on some traditional solutions in chemometrics. We show some calculations, how data leakage is there in chemometric tasks. Our other calculations focus on the scaling law in order to rehabilitate leave-one-out cross-validation.In machine learning, data leakage means the use of an information during the model building, which biases the prediction assessment of the model, or will not be available during real predictive application of the model. A typical and easy to detect example is when cases very similar to training ones are present in the test set. There is a different form of leakage, when variables or classes are present in the explanatory variables that are too closely related to the response variables. Data leakage causes problems in model ","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70026","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143749606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Green and Rapid Quantification of Ciprofloxacin Hydrochloride and Tylosin Tartrate in Veterinary Formulation using UV Spectrophotometric Method: A Comparative Study of Nature-Inspired Algorithms for Feature Selection 用紫外分光光度法绿色快速定量兽药中盐酸环丙沙星和酒石酸泰洛星：特征选择自然算法的比较研究

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-03-29 DOI: 10.1002/cem.70023

Mostafa M. Eraqi, Ayman M. Algohary, Youssef O. Al-Ghamdi, Ahmed M. Ibrahim

{"title":"Green and Rapid Quantification of Ciprofloxacin Hydrochloride and Tylosin Tartrate in Veterinary Formulation using UV Spectrophotometric Method: A Comparative Study of Nature-Inspired Algorithms for Feature Selection","authors":"Mostafa M. Eraqi, Ayman M. Algohary, Youssef O. Al-Ghamdi, Ahmed M. Ibrahim","doi":"10.1002/cem.70023","DOIUrl":"10.1002/cem.70023","url":null,"abstract":"<div>\u0000 \u0000 Rapid and accurate quantification of ciprofloxacin hydrochloride (CIP) and tylosin tartrate (TYZ) in veterinary formulations is crucial for ensuring product quality and therapeutic efficacy. This study introduces a green and cost-effective analytical method that combines the simplicity of UV spectrophotometry with the optimization power of nature-inspired algorithms for the simultaneous determination of CIP and TYZ in a tablet veterinary formulation. Fourteen nature-inspired algorithms were comparatively assessed using root average squared error (RASE), average absolute error (AAE), and the coefficient of determination (R2). The Corona virus optimization (CVO) algorithm and the Bat algorithm demonstrated superior performance for CIP and TYZ, respectively. The CVO algorithm, optimized for CIP, exhibited RASE, AAE, and R2 values of 0.37, 0.27, and 0.998, respectively, for the calibration set, while the bat algorithm, tailored for TYZ, yielded RASE, AAE, and R2 values of 0.54, 0.41, and 0.984. Test sets yielded RASE, AAE, and R2 values of 0.55, 0.46, and 0.991 for CIP and 0.20, 0.15, and 0.995 for TYZ, respectively, confirming the algorithms predictive ability. Validation was performed using the accuracy profile approach. The limits of detection (LODs) were determined to be 0.86 μg mL−1 for CIP and 0.36 μg mL−1 for TYZ, while the limits of quantification (LOQs) were calculated as 2.88 μg mL−1 for CIP and 1.21 μg mL−1 for TYZ. The method environmental impact was comprehensively assessed using The Green Solvent Selection Tool (GSST), The National Environmental Methods Index (NEMI), a modified Eco-Scale, the Modified GAPI (MoGAPI), and a complementary whiteness evaluation via the RGBfast algorithm, confirming its eco-friendly profile. The proposed method demonstrated superior greenness, as reflected in its elevated GSST scores and favorable NEMI assessment. Specifically, the method achieved a modified Eco-Scale score of 84, a MoGAPI score of 81, and a whiteness index of 61, as determined by the RGBfast algorithm. These results confirm the method environmentally sustainable profile, reinforcing its suitability for green analytical applications. This novel approach offers significant advantages in terms of cost, speed, and environmental sustainability compared to conventional chromatographic techniques, paving the way for more efficient and greener analytical methods in pharmaceutical quality control. Furthermore, this study highlights the innovative integration of UV spectroscopy with nature-inspired algorithms, demonstrating significant advancements over conventional UV methodologies for pharmaceutical analysis.\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143726769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Foreword for Special Issue Devoted to the 14th Winter Symposium on Chemometrics (2024) 第十四届化学计量学冬季研讨会特刊前言（2024）

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-03-25 DOI: 10.1002/cem.70022

Anastasiia Surkova, Dmitry Kirsanov

{"title":"Foreword for Special Issue Devoted to the 14th Winter Symposium on Chemometrics (2024)","authors":"Anastasiia Surkova, Dmitry Kirsanov","doi":"10.1002/cem.70022","DOIUrl":"10.1002/cem.70022","url":null,"abstract":"The 14th Winter Symposium on Chemometrics (WSC14) was held in Tsaghkadzor (Armenia) from 26 February to 1 March 2024. The WSC is a biannual international meeting series started in Russia in 2002. Since that time WSC became an important event that is well known among other chemometric meetings for its friendly and relaxed atmosphere, rich social program and consistently high quality of scientific presentations. The scope of WSC meetings covers all relevant topics in modern chemometrics, both in theoretical developments and practical applications. In 2024, the conference was held under the auspices of the Armenian Academy of Sciences. Thirty-six participants from eight countries took part in the meeting, and the scientific program contained six lectures, 16 talks and 17 poster presentations. The invited lectures were delivered by Prof. Douglas N. Rutledge (France), Prof. Stefan Tsakovski (Bulgaria), Prof. Hadi Parastar (Iran) and Prof. Xihui Bian (China). Key lectures were presented by Dr. Alexey Pomerantsev and Dr. Oxana Rodionova. The variety of presentation topics included applications of near infrared spectrometry, hyperspectral imaging, QSPR, aquaphotomics, multiblock data analysis, machine learning, and deep learning.The conference venue was located in a spectacular place near the Tsakhkadzor ski resort and as a part of the sportive program the participants were able to enjoy skiing in beautiful Armenian mountains. Traditional evening gatherings, so called “scores and loadings,” were conducted every conference evening with guitar playing, signing and informal discussions on all possible topics, either highly scientific or deeply prosaic. The last day of the conference was devoted to the guided tours to Sevan Lake with ancient Sevanavank monastery and to Yerevan city—the capital of hospitable Armenia.The WSC meetings are always very friendly to young scientists, offering Best young scientist award—this year the prize was the registration for CAC-2024 (Chemometrics in Analytical Chemistry) in Argentina. The respected jury of senior chemometricians decided to award Dr. Ekaterina Boichenko for her talk “Near-infrared spectroscopy and chemometrics: a promising combination for real-time and nondestructive classification of urinary stones.” Three best poster prizes were awarded to Anastasia Sholokhova, Dr. Maria Khaydukova, and Dr. Larisa Lvova. If the feedback from participants is to be believed, all in all it was an enjoyable event. The place and the time for WSC15 will be announced soon.Organizing committee of the 14th WSC.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 4","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70022","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143690125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Block Chemometric Approaches to the Unsupervised Spectral Characterization of Geological Samples 地质样品无监督光谱表征的多块化学计量学方法

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-03-16 DOI: 10.1002/cem.70010

Beatriz Galindo-Prieto, Ian S. Mudway, Johan Linderholm, Paul Geladi

{"title":"Multi-Block Chemometric Approaches to the Unsupervised Spectral Characterization of Geological Samples","authors":"Beatriz Galindo-Prieto, Ian S. Mudway, Johan Linderholm, Paul Geladi","doi":"10.1002/cem.70010","DOIUrl":"10.1002/cem.70010","url":null,"abstract":"As an example for the potential use of multi-block chemometric methods to provide improved unsupervised characterization of compositionally complex materials through the integration of multi-modal spectrometric data sets, we analysed spectral data derived from five field instruments (one XRF, two NIR, and two FT-Raman), collected on 76 bedrock samples of diverse composition. These data were analysed by single- and multi- block latent variable models, based on principal component analysis (PCA) and partial least squares (PLS). For the single-block approach, PCA and PLS models were generated; whilst hierarchical partial least squares (HPLS) regression was applied for the multi-block modelling. We also tested whether dimensionality reduction resulted in a more computationally efficient muti-block HPLS model with enhanced model interpretability and geological characterization power using the variable influence on projection (VIP) feature selection method.The results showed differences in the characterization power of the five spectrometer data sets for the bedrock samples based on their mineral composition and geological properties; moreover, some spectroscopic techniques under-performed for distinguishing samples by composition. The multi-block HPLS and its VIP-strengthened model yielded a more complete unsupervised geological aggrupation of the samples in a single parsimonious model. We conclude that multi-block HPLS models are effective at combining multi-modal spectrometric data to provide a more comprehensive characterization of compositionally complex samples, and VIP can reduce HPLS model complexity, while increasing its data interpretability. These approaches have been applied here to a geological data set, but are amenable to a broad range of applications across chemical and biomedical disciplines.","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143632623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Partition-Based Cross-Validation With Centering and Scaling for X T X $$ {mathbf{X}}^{mathbf{T}}mathbf{X} $$ and X T Y $$ {mathbf{X}}^{mathbf{T}}mathbf{Y} $$ X T X $$ {mathbf{X}}^{mathbf{T}}mathbf{X} $$和X T Y的快速基于分区的中心和缩放交叉验证 $$ {mathbf{X}}^{mathbf{T}}mathbf{Y} $$

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-03-13 DOI: 10.1002/cem.70008

Ole-Christian Galbo Engstrøm, Martin Holm Jensen

{"title":"Fast Partition-Based Cross-Validation With Centering and Scaling for \u0000 \u0000 \u0000 \u0000 \u0000 X\u0000 \u0000 \u0000 T\u0000 \u0000 \u0000 X\u0000 \u0000 $$ {mathbf{X}}^{mathbf{T}}mathbf{X} $$\u0000 and \u0000 \u0000 \u0000 \u0000 \u0000 X\u0000 \u0000 \u0000 T\u0000 \u0000 \u0000 Y\u0000 \u0000 $$ {mathbf{X}}^{mathbf{T}}mathbf{Y} $$","authors":"Ole-Christian Galbo Engstrøm, Martin Holm Jensen","doi":"10.1002/cem.70008","DOIUrl":"10.1002/cem.70008","url":null,"abstract":"We present algorithms that substantially accelerate partition-based cross-validation for machine learning models that require matrix products <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mrow>\u0000 <mi>X</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>T</mi>\u0000 </mrow>\u0000 </msup>\u0000 <mi>X</mi>\u0000 </mrow>\u0000 <annotation>$$ {mathbf{X}}^{mathbf{T}}mathbf{X} $$</annotation>\u0000 </semantics></math> and <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mrow>\u0000 <mi>X</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>T</mi>\u0000 </mrow>\u0000 </msup>\u0000 <mi>Y</mi>\u0000 </mrow>\u0000 <annotation>$$ {mathbf{X}}^{mathbf{T}}mathbf{Y} $$</annotation>\u0000 </semantics></math>. Our algorithms have applications in model selection for, for example, principal component analysis (PCA), principal component regression (PCR), ridge regression (RR), ordinary least squares (OLS), and partial least squares (PLS). Our algorithms support all combinations of column-wise centering and scaling of <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>X</mi>\u0000 </mrow>\u0000 <annotation>$$ mathbf{X} $$</annotation>\u0000 </semantics></math> and <math>\u0000 <semantics>\u0000 <mrow>\u0000 <mi>Y</mi>\u0000 </mrow>\u0000 <annotation>$$ mathbf{Y} $$</annotation>\u0000 </semantics></math>, and we demonstrate in our accompanying implementation that this adds only a manageable, practical constant over efficient variants without preprocessing. We prove the correctness of our algorithms under a fold-based partitioning scheme and show that the running time is independent of the number of folds; that is, they have the same time complexity as that of computing <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mrow>\u0000 <mi>X</mi>\u0000 </mrow>\u0000 <mrow>\u0000 <mi>T</mi>\u0000 </mrow>\u0000 </msup>\u0000 <mi>X</mi>\u0000 </mrow>\u0000 <annotation>$$ {mathbf{X}}^{mathbf{T}}mathbf{X} $$</annotation>\u0000 </semantics></math> and <math>\u0000 <semantics>\u0000 <mrow>\u0000 <msup>\u0000 <mrow>\u0000 <mi>X</mi>\u0000 <","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.70008","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143602456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Getting Insights Into Chromatographic Properties of HILIС and Mixed-Mode Homemade Stationary Phases Using Principal Component and Cluster Analyses 利用主成分和聚类分析深入了解HILIС和混合模式自制固定相的色谱性质

IF 2.1 4区化学

Journal of Chemometrics Pub Date : 2025-03-12 DOI: 10.1002/cem.70019

A. Shemiakina, M. Khrisanfov, N. Chikurova, A. Samokhin, A. Chernobrovkina

{"title":"Getting Insights Into Chromatographic Properties of HILIС and Mixed-Mode Homemade Stationary Phases Using Principal Component and Cluster Analyses","authors":"A. Shemiakina, M. Khrisanfov, N. Chikurova, A. Samokhin, A. Chernobrovkina","doi":"10.1002/cem.70019","DOIUrl":"10.1002/cem.70019","url":null,"abstract":"<div>\u0000 \u0000 In this work, we compared the chromatographic properties of 27 homemade monomer- and polymer-modified stationary phases synthesized via the Ugi reaction for hydrophilic interaction liquid chromatography (HILIC). These stationary phases along with the unmodified substrate were characterized by retention factors of 33 polar biologically active compounds belonging to various classes (nucleobases/nucleosides, sugars, carboxylic acids, and water-soluble vitamins). Additionally, the widely used Tanaka HILIC test was performed. The experimental data from both characterization approaches were processed using several chemometric techniques, including principal component analysis (PCA), hierarchical cluster analysis (HCA), and K-means algorithm. It was initially expected that polymer-modified phases would differ significantly from monomer-modified ones due to their mixed-mode properties. It was confirmed by the clear separation of these two types of stationary phases on the PCA score plot obtained for binary logarithms of selectivities (calculated from all 33 retention factors). Dissimilarities observed among some monomer-modified stationary phases resulted in insights into Ugi reaction conditions suitable for obtaining adsorbents with distinct chromatographic properties. Each class of test compounds required specific mobile phase composition to achieve reasonable chromatographic characteristics, such as retention times and peak shapes. To exclude the long-lasting re-equilibration stage associated with mobile phase changes, a smaller set of only three test compounds was proposed, yielding nearly the same clustering results as the complete dataset. This simplified procedure can facilitate the rapid characterization of newly synthesized stationary phases and allow for comparison with previously studied phases.\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 3","pages":""},"PeriodicalIF":2.1,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143595366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0