{"title":"Application of multi-omics fusion technique based on Raman spectroscopy and metabolomics in early diagnosis and activity prediction of systemic lupus erythematosus","authors":"Pei Liu , Xuguang Zhou , Xiaoyi Lv , Cheng Chen , Xiaomei Chen , Cainan Luo , Xue Wu , Chen Chen , Lijun Wu","doi":"10.1016/j.chemolab.2025.105513","DOIUrl":"10.1016/j.chemolab.2025.105513","url":null,"abstract":"<div><div>The combination of artificial intelligence and Raman spectroscopy provides new ideas and methods for auxiliary diagnosis of diseases. However, in systemic lupus erythematosus (SLE), there are problems of high pathological consistency and large overlap of spectral information, and single spectral omics cannot obtain ideal results. However, metabolomics has the advantages of directly reflecting the metabolic status in organisms and gaining in-depth understanding of the physiological and pathological states of organisms. At the same time, multi-omics fusion technology can effectively integrate the characteristics of different omics levels. Therefore, this study proposed a Multi-omics Decoupling-Bipartite Attentional Weighting (MDBAW) fusion model based on Raman spectroscopic omics and metabolomics data for the first time. The model fully considers the unique and shared representations between omics, and adds attention weight modules at the input and output ends to give more weight to the features with large amount of information in the two omics modalities. Finally, the experimental results on three data sets proved that the MDBAW model is superior to single-omics and other advanced multi-omics fusion models, and can effectively improve the accuracy of SLE classification diagnosis and activity prediction. In addition, through the correlation analysis of Raman spectroscopic omics and metabolomics data and KEGG pathway analysis, the interpretability of the fusion of these two omics in auxiliary disease diagnosis applications was verified, and the ability of Raman spectroscopy to detect metabolites was proved.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"266 ","pages":"Article 105513"},"PeriodicalIF":3.8,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144893420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Camacho , A.K. Smilde , E. Saccenti , J.A. Westerhuis , R. Bro
{"title":"All sparse PCA models are wrong, but some are useful. Part III: Model interpretation","authors":"J. Camacho , A.K. Smilde , E. Saccenti , J.A. Westerhuis , R. Bro","doi":"10.1016/j.chemolab.2025.105498","DOIUrl":"10.1016/j.chemolab.2025.105498","url":null,"abstract":"<div><div>Sparse Principal Component Analysis (sPCA) is a popular matrix factorization that combines variance maximization and sparsity with the ultimate goal of improving data interpretation. In this series of papers we show that the factorization with sPCA can be complex to interpret even when confronted with simple data. In the first paper in this series, we demonstrated that sPCA models have limitations with respect to factorizing sparse and noise-free data accurately when loadings are overlapping. In the second paper, we showed that sPCA algorithms based on deflation can generate artifacts in high order components. We also show that scores orthogonalization and the incorporation of orthonormal loadings are suitable means to avoid large artifacts. Both approaches constrain the set of possible sPCA solutions in a very similar but poorly understood way. In particular, we study in this paper the sPCA solution by Zou et al., which according to our results represent the best sPCA algorithm of those considered in the series. Here, we provide new derivations on the model equations, the computation and interpretation of the model parameters and the selection of metaparemeters in practical cases, making sPCA an even more powerful data modeling tool.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"266 ","pages":"Article 105498"},"PeriodicalIF":3.8,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144864974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coal origin identification based on visible-infrared spectroscopy and attention networks","authors":"Jingyi Liu , Ba Tuan Le , Thai Thuy Lam Ha","doi":"10.1016/j.chemolab.2025.105501","DOIUrl":"10.1016/j.chemolab.2025.105501","url":null,"abstract":"<div><div>Coal origin identification is a crucial process in the coal industry, which is important in ensuring coal quality and optimizing supply chain management. However, due to the diversity of coal mine resources and the increasing market demands for quality, coal origin identification has become more complex. This study proposes a coal origin identification method based on spectroscopy and advanced machine learning techniques with deep attention networks. Through an improved model architecture and optimization strategy, the method achieves efficient classification and precise recognition of coal samples. This method uses the attention network as the core to fully explore the potential spectral features in coal samples. Experimental results show that compared with traditional methods, this method has achieved significant improvements in multiple key indicators, verifying its superior performance and application potential. This study not only provides an efficient and reliable solution for coal origin identification, but also provides important support for the intelligent and precise development of the coal industry.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"266 ","pages":"Article 105501"},"PeriodicalIF":3.8,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144886077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fangyuan Ma , Cheng Ji , Jingde Wang , Wei Sun , Jose A. Romagnoli
{"title":"Orthogonal long short-term memory autoencoder for semi-supervised soft sensor modeling","authors":"Fangyuan Ma , Cheng Ji , Jingde Wang , Wei Sun , Jose A. Romagnoli","doi":"10.1016/j.chemolab.2025.105499","DOIUrl":"10.1016/j.chemolab.2025.105499","url":null,"abstract":"<div><div>Data-driven soft sensor methods are popularly applied to predict hard-to-measure variables in industrial production processes. However, in practice, the number of labeled samples is limited, which will affect the accuracy of developed soft sensors. Aiming at this point, semi-supervised soft sensor methods are proposed that combine unsupervised feature extraction and supervised mapping correlation establishment. Auto encoder (AE) is a commonly used feature extraction method for effectively capturing the nonlinear features of processes from unlabeled data. Since typical AEs have no special constraints on the output of latent space, there could be redundancy among the extracted features, which will increase the complexity of mapping correlation establishment. Meanwhile, the dynamic features of processes are difficult to extract by typical AE. Both issues could affect the performance of soft sensors. To address these issues, an Orthogonal Long Short-Term Memory Auto encoder (OLAE) is proposed in this work. By adding the orthogonal constraint on latent space output to the loss function of Long Short-Term Memory Auto encoder, orthogonal dynamic features can be obtained. Then, the OLAE is employed in the feature extraction stage. Using Chatterjee's New Coefficient, orthogonal features related to hard-to-measure variables are screened out for mapping correlation establishment. Considering the limited number of labeled data samples, a prediction model based on support vector regression is established to realize the prediction of hard-to-measure variables. Data from a penicillin fermentation process and an industrial cracking furnace are investigated to evaluate the effectiveness of the proposed soft sensor method.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"265 ","pages":"Article 105499"},"PeriodicalIF":3.8,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144773027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Nagendran , Sudhir Ramadass , K. Thilagvathi , Ananda Ravuri
{"title":"Neural reinforcement-oriented hyperspectral image compression: Adaptive approaches for enhanced quality","authors":"R. Nagendran , Sudhir Ramadass , K. Thilagvathi , Ananda Ravuri","doi":"10.1016/j.chemolab.2025.105495","DOIUrl":"10.1016/j.chemolab.2025.105495","url":null,"abstract":"<div><div>Hyperspectral images (HSIs) typically contain hundreds or even thousands of bands that cover a wide range of wavelengths, each containing the material's spectral and spatial properties. New developments in remote sensing (RST) have enabled hyperspectral images (HSIs) with higher spectral and spatial resolution. However, these images' huge dimensionality and computational complexity provide difficulties for researchers. Broad spectral bands and redundancies contribute to huge dimensionality. In contrast, high spectral resolution, various sample ratios, and data dimensionality are the causes of computational complexity, which lowers precision and increases processing complexity. To address the challenge, a novel Neural Reinforcement with Adaptable Compression (NRAC) approach is proposed for dimension reduction in HSIs. The proposed technique involves initial computation of the mean for each instance in the input image and generating a matrix to select relevant bands from the HSI. After that, the dilation process takes place to mitigate intensity fluctuations of the HSI image. Then, to extract the spatial and spectral features, Convolutional CodeMapper is introduced for pixel value localization, thereby reducing computational overhead and overfitting. Thus, the NRAC approach accurately reconstructed the original HSI image and reduced the dimensionality issue. The evaluation utilized the Indian Pines, Jasper Ridge, Cuprite, and Pavia University Dataset, which attained high Peak Signal-to-Noise Ratio (PSNR), Mean Spectral Reconstruction Error (MSRE), Structural Similarity Index (SSIM), Feature Similarity Index (FSIM), and compression ratio metrics, demonstrating the efficacy of the proposed technique based on prior methods. The enhanced real-world implications of NRAC's capacity to interpret hyperspectral data include better remote sensing, agriculture, and environmental monitoring applications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"266 ","pages":"Article 105495"},"PeriodicalIF":3.8,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144890406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Infrared spectrum difference analysis and rapid identification of Paris polyphylla var. yunnanensis from different geographical origin","authors":"Yangna Feng , Yuanzhong Wang","doi":"10.1016/j.chemolab.2025.105496","DOIUrl":"10.1016/j.chemolab.2025.105496","url":null,"abstract":"<div><div><em>Paris polyphylla</em> var. <em>yunnanensis</em> (PPY) is an important medicinal plant resource, but its quality is greatly affected by the growing environment. There are many mixed uses in the market, but it's difficult to distinguish the good from the bad. Therefore, rapid geographical origin traceability of PPY is of great significance for the safety and efficacy of medication. In this study, through the analysis of conventional Fourier transform infrared spectroscopy (FI-IRS), second-derivative infrared spectroscopy (SD-IRS) and two-dimensional correlation spectroscopy (2DCOS) images, the spectral differences of PPY from different origin were investigated. Hierarchical (HCA) and principal component analysis (PCA) were used to conduct a preliminary exploration of the clustering formation of PPY samples from different places, and then, partial least squares discriminant analysis (PLS-DA), support vector machine (SVM), extreme learning machine (ELM), decision tree (DT), back propagation neural network (BPNN), residual convolutional neural network (ResNet) 6 machine learning algorithms were used to trace the origin of PPY, aiming to provide a rich method reference for the research of PPY from different places. The results showed that FT-IRS could characterize the difference of PPY in different places, PLS-DA, SVM and ResNet all obtained good results, and ResNet model could reach 100 % accuracy. The performance of other models may be related to the size of the data set. The results of this study can promote the rapid quality detection of PPY and provide guarantee for the drug safety of PPY.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"265 ","pages":"Article 105496"},"PeriodicalIF":3.8,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144748782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhanced predictive accuracy of pancreatic ductal adenocarcinoma staging: A synergistic approach merging machine learning algorithms with metabolic profiling","authors":"Boqiang Liao , Junqi Huang , Honghai Chen, Feng Xia, Pengfei Guo, Ge Song, Jianghua Feng, Guiping Shen","doi":"10.1016/j.chemolab.2025.105497","DOIUrl":"10.1016/j.chemolab.2025.105497","url":null,"abstract":"<div><div>Early diagnosis and treatment are pivotal for enhancing the survival rates of pancreatic cancer patients, emphasizing the necessity for precise staging of pancreatic ductal adenocarcinoma (PDAC). This study presents a hybrid model that combines convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and traditional machine learning (ML) methods to predict PDAC staging based on metabolic characteristics. To address the data imbalance in PDAC datasets, the adaptive synthetic (ADASYN) sampling algorithm was utilized to augment minority class samples. The CNN-LSTM-ML hybrid model was developed and its performance was evaluated against traditional classification methods. The hybrid model achieved an optimal classification accuracy of 90.00 %, surpassing the performance of traditional methods. The confusion matrix indicated 100 % prediction accuracy for PDAC-I and PDAC-IV stages, and 66.67 % and 83.33 % for PDAC-II and PDAC-III stages, respectively. Validation across datasets with varying degrees of malnutrition confirmed the model's reliability. These results demonstrated the excellent predictive performance of the CNN-LSTM-ML hybrid model and its potential applicability to staging prediction in other clinical conditions, contributing to the advancement of precision and personalized medical interventions.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"265 ","pages":"Article 105497"},"PeriodicalIF":3.7,"publicationDate":"2025-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144712938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SCPD: Splitting-based constrained parallel factor decomposition for fluorescence spectroscopy analysis","authors":"Ke Wang , Ban-teng Liu","doi":"10.1016/j.chemolab.2025.105487","DOIUrl":"10.1016/j.chemolab.2025.105487","url":null,"abstract":"<div><div>The fluorescence spectroscopy offers a reliable and fast detection method for quantitative and qualitative information in modern industrial processes. In fluorescence spectroscopy analysis, a common problem is how to extract and represent the latent structure from original data in tensor form. However, existing studies have difficulties in efficiently obtaining precise mathematical results for constrained least-squares problems. In this paper, a new split-based constrained decomposition algorithm is proposed, building upon the parallel factor analysis and alternating direction method of multipliers. Combined with parameter selection strategies, it is shown that this distributed algorithm is suitable for parallel implementation with a good convergence property. Experiments on data taken from synthetic and real-world data indicate its potential utility in fluorescence spectroscopy analysis and other application domains.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"265 ","pages":"Article 105487"},"PeriodicalIF":3.7,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144711453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ANOVA-based Taguchi approach to optimise the rate of heat transport for the magnetised nanofluid across an exponential surface","authors":"J.K. Madhukesh , G.K. Ramesh","doi":"10.1016/j.chemolab.2025.105494","DOIUrl":"10.1016/j.chemolab.2025.105494","url":null,"abstract":"<div><div>A reliable design optimization tool, the Taguchi method has been used to optimise system characteristics and boost performance in various kinds of engineering applications. In this study, we use the Taguchi technique to examine how to optimise nanofluid flow over an exponential surface. Additional force like magnetic field, radiation, pollutant dispersion and Smoluchowski temperature at the boundary are incorporated. Using the appropriate transformations, the modeled partial differential equations (PDEs) can be converted into dimensionless ordinary differential equations. Ordinary differential equations that are coupled have been numerically solved in their dimensionless form using the Adams-Bashforth-Moulton method. It has been thoroughly studied how the physical parameters affect temperature, velocity, and concentration. Visual displays are provided for the numerical findings for each of the pertinent physical parameters. Additionally, to optimise the system's heat transfer versus specific parameters, the Taguchi optimization technique is used in conjunction with Analysis of Varience (ANOVA) and multivariate regression analysis. It is noted that with an Signal-to-Noise Ratio (SNR) value of 11.3667, experimental number 13 has the highest Nusselt number (<em>Nu</em>) and the best conditions for heat transmission. Experimental number 16 has the lowest <em>Nu</em>, with an SNR of 2.6519.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"265 ","pages":"Article 105494"},"PeriodicalIF":3.7,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144713377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A unifying framework for modelling non-negative bi-linear, tri-linear and “in-between” data in chemometrics. Part I: Theoretical framework and concepts","authors":"Paul-Albert Schneide , Neal Gallagher , Jesper Løve Hinrich , Rasmus Bro , Romà Tauler","doi":"10.1016/j.chemolab.2025.105492","DOIUrl":"10.1016/j.chemolab.2025.105492","url":null,"abstract":"<div><div>In chemometrics, extracting chemically meaningful information from multi-way analytical data is often challenged by deviations from ideal tri-linear structure of the chemical information. This work introduces a novel modeling approach based on (1, <span><math><mrow><msub><mi>L</mi><mi>r</mi></msub></mrow></math></span>, <span><math><mrow><msub><mi>L</mi><mi>r</mi></msub></mrow></math></span>) block term decompositions, which flexibly bridges the gap between bi-linear and tri-linear models. The method builds upon the MCR-tri-linearity framework and leverages uniqueness conditions established by De Lathauwer to ensure interpretable factor solutions under practical conditions. A rank-constrained alternating optimization algorithm is proposed to adaptively determine the number of principal components needed for reconstructing varying-mode factors, based on a user-defined reconstruction error tolerance. This adaptive decomposition balances the essential uniqueness of tri-linear models with the flexibility of bi-linear approaches, addressing limitations in both. Simulated data with controlled component ranks demonstrate the method's ability to recover ground-truth factors more accurately than classical tri-linear models, while reducing ambiguity compared to bi-linear models. The results confirm that the proposed approach provides an effective framework for analyzing multi-way chemical data with partial or full deviations from tri-linearity, making it a promising tool for a wide range of chemometric applications.</div></div>","PeriodicalId":9774,"journal":{"name":"Chemometrics and Intelligent Laboratory Systems","volume":"265 ","pages":"Article 105492"},"PeriodicalIF":3.8,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144773028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}