{"title":"Evaluation of Classical Least Squares Discriminant Analysis (CLS-DA) as a Novel Supervised Pattern Recognition Technique","authors":"Somaye Vali Zade, Hamid Abdollahi","doi":"10.1002/cem.3609","DOIUrl":"https://doi.org/10.1002/cem.3609","url":null,"abstract":"<div>\u0000 \u0000 <p>Multivariate calibration techniques and machine learning algorithms are inextricably linked within the realm of chemometrics and data analysis. Classical least squares (CLS) modeling, a fundamental multivariate regression approach, has traditionally been utilized for quantitative analysis tasks, establishing relationships between predictor variables (e.g., spectroscopic data) and response variables (e.g., chemical concentrations). However, a unique feature of CLS is its ability to handle scenarios with partial knowledge of the independent variable matrix, making it an intriguing candidate for qualitative pattern recognition and discriminant analysis applications. This study proposes a novel approach, Classical Least Squares Discriminant Analysis (CLS-DA), which combines the principles of CLS modeling with discriminant analysis objectives. The performance of CLS-DA is comprehensively evaluated using two real-world datasets: chemical analysis of three wine cultivars and mid-infrared spectroscopy of minced meat samples (pork, chicken, and turkey). The results are compared against the well-established Partial Least Squares Discriminant Analysis (PLS-DA) method, a widely adopted technique for classification tasks in chemometrics. For both sets of experimental data, CLS-DA and PLS-DA showed comparable efficiency. For the classification of three types of wine, the accuracy of the proposed method was 94.3%, while the accuracy of the reference method was 98.1%. For the classification of minced meat samples, the accuracies of CLS-DA and PLS-DA were 97.2% and 94%, respectively for all three groups. The findings demonstrate the potential of CLS-DA as a straightforward and interpretable supervised pattern recognition technique, exhibiting comparable classification performance to PLS-DA. The study highlights the advantages of CLS-DA, including its ability to operate within the original data space and its flexibility in accommodating partial knowledge scenarios. The proposed CLS-DA approach presents a promising alternative for discriminant analysis, offering new perspectives on the applications of classical least squares modeling in chemometrics.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142862344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anastasiia Surkova, Ekaterina Boichenko, Olga Bibikova, Viacheslav Artyushenko, Jelena Muncan, Roumiana Tsenkova
{"title":"Near-Infrared Spectroscopy and Aquaphotomics in Cancer Research: A Pilot Study","authors":"Anastasiia Surkova, Ekaterina Boichenko, Olga Bibikova, Viacheslav Artyushenko, Jelena Muncan, Roumiana Tsenkova","doi":"10.1002/cem.3600","DOIUrl":"https://doi.org/10.1002/cem.3600","url":null,"abstract":"<div>\u0000 \u0000 <p>Currently, the majority of methods to monitor cancer treatment through the analysis of body fluids are based on a highly selective detection of single molecules or cells. In this study, we are considering the analysis of the aqueous medium of liquid samples, that is, water, itself, using aquaphotomics and near-infrared spectroscopy (NIR) for spectral data acquisition and processing, within cancer research. Water, as a molecular system, is a rich source of information about the current state of a patient, which can be extracted from near-infrared spectra of liquid samples via simple algorithms based on multivariate data analysis. The reported results, obtained ex vivo of body fluids, demonstrate the potential of aquaphotomics in cancer research.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142862257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Helene Fog Froriep Halberg, Marta Bevilacqua, Åsmund Rinnan
{"title":"Resampling as a Robust Measure of Model Complexity in PARAFAC Models","authors":"Helene Fog Froriep Halberg, Marta Bevilacqua, Åsmund Rinnan","doi":"10.1002/cem.3601","DOIUrl":"10.1002/cem.3601","url":null,"abstract":"<p>Fluorescence spectroscopy has been applied for analysis of complex samples, such as food and beverages. Parallel factor analysis (PARAFAC) is a well-known decomposition method for fluorescence excitation–emission matrices (EEMs). When the complexity of the system increases, it becomes considerably more difficult to determine the optimal number of PARAFAC components, especially when the fluorophores of the system are unknown. The two commonly applied diagnostics, core consistency and split-half analysis, appear to underestimate the model complexity due to covarying components and local minima, respectively. As a more robust alternative, we propose a resampling approach with multiple initializations and submodel comparisons for estimating the optimal number of PARAFAC components in complex data.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3601","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Irene Mariñas-Collado, Juan M. Rodríguez-Díaz, M. Teresa Santos-Martín
{"title":"A Non-Linear Model for Multiple Alcohol Intakes and Optimal Designs Strategies","authors":"Irene Mariñas-Collado, Juan M. Rodríguez-Díaz, M. Teresa Santos-Martín","doi":"10.1002/cem.3599","DOIUrl":"10.1002/cem.3599","url":null,"abstract":"<p>This study addresses the complex dynamics of alcohol elimination in the human body, very important in forensic and healthcare areas. Existing models often oversimplify with the assumption of linear elimination kinetics, limiting practical application. This study presents a novel non-linear model for estimating blood alcohol concentration after multiple intakes. Initially developed for two different alcohol incorporations, it can be straightforwardly extended to the case of more intakes. Emphasising the significance of accurate parameter estimation, the research underscores the importance of precise experimental design, utilising optimal experimental design (OED) methodologies. Sensitivity analysis of model coefficients and the determination of D-optimal designs, considering correlation structures among observations, reveal a strong linear relationship between support points. This relationship can be used to obtain nearly optimal designs that are highly efficient and much easier to compute.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3599","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Population Power Curves in ASCA With Permutation Testing","authors":"José Camacho, Michael Sorochan Armstrong","doi":"10.1002/cem.3596","DOIUrl":"10.1002/cem.3596","url":null,"abstract":"<p>In this paper, we revisit the power curves in ANOVA simultaneous component analysis (ASCA) based on permutation testing and introduce the population curves derived from population parameters describing the relative effect among factors and interactions. The relative effect has important practical implications: The statistical power of a given factor depends on the design of other factors in the experiment and not only of the sample size. Thus, understanding the relative power in a specific experimental design can be extremely useful to maximize our capability of success when planning the experiment. In the paper, we derive relative and absolute population curves, where the former represent statistical power in terms of the normalized effect size between structure and noise, and the latter in terms of the sample size. Both types of population curves allow us to make decisions regarding the number and nature (fixed/random) of factors, their relationships (crossed/nested), and the number of levels and replicates, among others, in an multivariate experimental design (e.g., an omics study) during the planning phase of the experiment. We illustrate both types of curves through simulation.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3596","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chemometric Classification of Motor Oils Using 1H NMR Spectroscopy With Simultaneous Phase and Baseline Optimization","authors":"A. Olejniczak, J. P. Łukaszewicz","doi":"10.1002/cem.3598","DOIUrl":"10.1002/cem.3598","url":null,"abstract":"<div>\u0000 \u0000 <p>Here, we demonstrate mid-field <sup>1</sup>H NMR spectroscopy combined with chemometrics to be powerful in the classification and authentication of motor oils (MOs). The <sup>1</sup>H NMR data were processed with a new algorithm for simultaneous phase and baseline correction, which, for crowded spectra such as those of the refinery products, allowed for more accurate estimation of phase parameters than other literature approaches tested. A principal component analysis (PCA) model based on the unbinned CH<sub>3</sub> fingerprint region (0.6–1.0 ppm) enabled the differentiation of hydrocracked and poly-α-olefin-based MOs and was effective in resolving mixtures of these base stocks with conventional base oils. PCA analysis of the 1.0- to 1.14-ppm region enabled the detection of poly (isobutylene) additive and was useful for differentiating between single-grade and multigrade MOs. Non-equidistantly binned <sup>1</sup>H NMR data were used to detect the addition of esters and to establish discriminant models for classifying MOs by viscosity grade and by major categories of synthetic, semisynthetic, and mineral oils. The performances of four classifiers (linear discriminant analysis [LDA], quadratic discriminant analysis [QDA], naïve Bayes classifier [NBC], and support vector machine [SVM]) with and without PCA dimensionality reduction were compared. In both tasks, SVM showed the best efficiency, with average error rates of ~2.3% and 8.15% for predicting major MO categories and viscosity grades, respectively. The potential to merge spectra collected from different NMR instruments is discussed for models based on spectral binning. It is also shown that small errors in phase parameters are not detrimental to binning-based PCA models.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 12","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Some Views on Multi-criteria Methods for Data Analysis","authors":"Henk A. L. Kiers, Marieke E. Timmerman","doi":"10.1002/cem.3597","DOIUrl":"10.1002/cem.3597","url":null,"abstract":"<p>Many data analysis methods actually combine optimization of several criteria. In this paper, a framework is offered for categorizing such multi-criteria methods. In particular, it categorizes multiset and three-way analysis methods as well as penalized methods and combinations thereof. The framework aims to stimulate critical evaluation of methods and reflection on the purpose of methods and, by signaling gaps, to help the development of new data analysis methods.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 11","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3597","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Automated System for Early Diabetic Retinopathy Detection and Severity Classification","authors":"Santoshkumar S Ainapur, Virupakshappa Patil","doi":"10.1002/cem.3593","DOIUrl":"https://doi.org/10.1002/cem.3593","url":null,"abstract":"<div>\u0000 \u0000 <p>Diabetes is a common and serious global disease that damages blood vessels in the eye, leading to vision loss. Early and accurate diagnosis of this issue is crucial to reduce the risk of visual impairment. The typical deep learning (DL) methods for diabetic retinopathy (DR) grading are often time-consuming, resulting in unsatisfactory detection performance due to inadequate representation of lesion features. To overcome these challenges, this research proposes a new automated mechanism for detecting and classifying DR, aiming to identify DR severities and different stages. To figure out and capture feature characteristics from DR samples, a conjugated attention mechanism and vision transformer are utilized within a collective net model, which automatically generates feature maps for diagnosing DR. These extracted feature maps are then fused through the feature fusion function in a fused attention net model, calculating attention weights to produce the most powerful feature map. Finally, the DR cases are identified and discriminated using the kernel extreme learning machine (KELM) model. For evaluating DR severity, our work utilizes four different benchmark datasets: APTOS 2019, MESSIDOR-2 dataset, DiaRetDB1 V2.1, and DIARETDB0 datasets. To illuminate data noise and unwanted variations, two preprocessing steps are carried out, which include contrast enhancement and illumination correction. The experimental results evaluated using well-known indicators demonstrate that the suggested method achieves a higher accuracy of 99.63% compared to other baseline methods. This research contributes to the development of powerful DR screening techniques that are less time-consuming and capable of automatically identifying DR severity levels at a premature level.</p>\u0000 </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 11","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142666139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Klaus Neymeyr, Martina Beese, Hamid Abdollahi, Mathias Sawall
{"title":"Can Angle Measures Be Useful in MCR Analyses?","authors":"Klaus Neymeyr, Martina Beese, Hamid Abdollahi, Mathias Sawall","doi":"10.1002/cem.3582","DOIUrl":"10.1002/cem.3582","url":null,"abstract":"<p>In MCR analyses, the similarity of pairs of spectra or concentration profiles can be measured in terms of the acute angle that is enclosed by the representing vectors. Acute angles between vectors can be generalized to pairs of subspaces. So-called canonical angles, also called principal angles, measure the mutual orientation of a pair of subspaces. This work discusses how angles and canonical angles can support multivariate curve resolution analyses. A canonical angle analysis (CAA) can help to detect changes of the chemical composition during a chemical reaction in a way comparable, but different to the evolving factor analysis (EFA).</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 11","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3582","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142209896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible Trilinearity Alignment (FTA) and Shift Invariant Transformation (SIT) Constraints in Three-Way Multivariate Curve Resolution Data Analysis","authors":"Xin Zhang, Romà Tauler","doi":"10.1002/cem.3581","DOIUrl":"10.1002/cem.3581","url":null,"abstract":"<p>In this work, two alternative ways of analyzing three-way data with multivariate curve resolution alternating least squares (MCR-ALS) using the trilinearity constraint are described and compared. Different synthetic datasets and experimental three-way datasets covering different scenarios are analyzed, and the results obtained are compared. The two new different ways of applying the trilinearity constraint are named flexible trilinearity alignment (FTA) and shift invariant transformation (SIT). The effects of noise in the application of both types of constraints are investigated in detail. Results show that both approaches are particularly adequate for those cases like in gas chromatography and especially in liquid chromatography where the elution profiles of the same chemical component in different chromatographic runs are not totally reproducible because they are time shifted, although they preserve their shape. When strong time shifts and co-elution occur, then the “standard” trilinear model does not work, and alternative approaches should be used, such as the MCR extended bilinear model to multiset (multirun) data, or the proposed relaxation of the trilinearity constraint in the FTA and SIT methods to capture the time drift changes produced in the elution profiles of the resolved components.</p>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 11","pages":""},"PeriodicalIF":2.3,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cem.3581","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141928166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}