{"title":"Seeding multivariate algorithms for spectral analysis, a data augmentation approach to enhance analytical performance","authors":"M.E. Keating, H.J. Byrne","doi":"10.1016/j.saa.2025.126369","DOIUrl":null,"url":null,"abstract":"<div><div>Seeding spectral datasets by augmenting the data matrix with either the full spectrum or selected spectral features in order to bias multivariate analysis towards the solution of interest is explored. It is demonstrated that such seeding can have a profound effect on the endpoint of the analysis. Using Raman spectroscopic data of human lung adenocarcinoma cells (A549) in vitro, systematic perturbations to the spectra are introduced to simulate dose dependent exposure to a drug (cisplatin), and/or cellular response, representing reduced viability. Taking Principal Components Analysis (PCA) as the first example, seeding with the known spectral profiles of the drug exposure is demonstrated to greatly increase the ability of the algorithm to differentiate two distinct data subsets, representing control and exposed. The improved differentiation is quantified by further Linear Discriminant Analysis of the PCA data. Other examples of where seeding may be applied include, simulated datasets consisting of simultaneous changes in the spectral markers of exposure dose and cellular response, which are used for Multivariate Curve Resolution – Alternating Least Squares analysis (MCR-ALS). In the example presented, adding pure components to the dataset improves the ability of the algorithm to both model the systematic variation of concentration dependent data and extract the component spectra more accurately than the unseeded dataset. The seeded approach thus provides improved performance for differential analysis of datasets, as well as spectral unmixing analyses, to monitor, for example, the kinetic evolution of a reaction mixture, or metabolic pathway.</div></div>","PeriodicalId":433,"journal":{"name":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","volume":"340 ","pages":"Article 126369"},"PeriodicalIF":4.3000,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386142525006754","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SPECTROSCOPY","Score":null,"Total":0}
引用次数: 0
Abstract
Seeding spectral datasets by augmenting the data matrix with either the full spectrum or selected spectral features in order to bias multivariate analysis towards the solution of interest is explored. It is demonstrated that such seeding can have a profound effect on the endpoint of the analysis. Using Raman spectroscopic data of human lung adenocarcinoma cells (A549) in vitro, systematic perturbations to the spectra are introduced to simulate dose dependent exposure to a drug (cisplatin), and/or cellular response, representing reduced viability. Taking Principal Components Analysis (PCA) as the first example, seeding with the known spectral profiles of the drug exposure is demonstrated to greatly increase the ability of the algorithm to differentiate two distinct data subsets, representing control and exposed. The improved differentiation is quantified by further Linear Discriminant Analysis of the PCA data. Other examples of where seeding may be applied include, simulated datasets consisting of simultaneous changes in the spectral markers of exposure dose and cellular response, which are used for Multivariate Curve Resolution – Alternating Least Squares analysis (MCR-ALS). In the example presented, adding pure components to the dataset improves the ability of the algorithm to both model the systematic variation of concentration dependent data and extract the component spectra more accurately than the unseeded dataset. The seeded approach thus provides improved performance for differential analysis of datasets, as well as spectral unmixing analyses, to monitor, for example, the kinetic evolution of a reaction mixture, or metabolic pathway.
期刊介绍:
Spectrochimica Acta, Part A: Molecular and Biomolecular Spectroscopy (SAA) is an interdisciplinary journal which spans from basic to applied aspects of optical spectroscopy in chemistry, medicine, biology, and materials science.
The journal publishes original scientific papers that feature high-quality spectroscopic data and analysis. From the broad range of optical spectroscopies, the emphasis is on electronic, vibrational or rotational spectra of molecules, rather than on spectroscopy based on magnetic moments.
Criteria for publication in SAA are novelty, uniqueness, and outstanding quality. Routine applications of spectroscopic techniques and computational methods are not appropriate.
Topics of particular interest of Spectrochimica Acta Part A include, but are not limited to:
Spectroscopy and dynamics of bioanalytical, biomedical, environmental, and atmospheric sciences,
Novel experimental techniques or instrumentation for molecular spectroscopy,
Novel theoretical and computational methods,
Novel applications in photochemistry and photobiology,
Novel interpretational approaches as well as advances in data analysis based on electronic or vibrational spectroscopy.