Jiangsan Zhao, Tomasz Woznicki, Krzysztof Kusnierek
{"title":"Estimating baselines of Raman spectra based on transformer and manually annotated data.","authors":"Jiangsan Zhao, Tomasz Woznicki, Krzysztof Kusnierek","doi":"10.1016/j.saa.2024.125679","DOIUrl":null,"url":null,"abstract":"<p><p>Raman spectroscopy is a powerful and non-invasive analytical method for determining the chemical composition and molecular structure of a wide range of materials, including complex biological tissues. However, the captured signals typically suffer from interferences manifested as noise and baseline, which need to be removed for successful data analysis. Effective baseline correction is critical in quantitative analysis, as it may impact peak signature derivation. Current baseline correction methods can be labor-intensive and may require extensive parameter adjustment depending on the input spectrum characteristics. In contrast, deep learning-based baseline correction models trained across various materials, offer a promising and more versatile alternative. This study reports an approach to manually identify the ground-truth baselines for eight different biological materials through extensively tuning the parameters of three classical baseline correction methods, Modified Multi-Polynomial Fit (Modpoly), Improved Modified Multi-Polynomial Fitting (IModpoly), and Adaptive Iteratively Reweighted Penalized Least Squares (airPLS), and combining the outputs to best fit the training data. We designed a one-dimensional Transformer (1dTrans) tailored to fit Raman spectral data for estimating their baselines, and evaluated its performance against convolutional neural network (CNN), ResUNet, and three aforementioned parametric methods. The 1dTrans model achieved lower mean absolute error (MAE) and spectral angle mapper (SAM) scores when compared to the other methods in both development and evaluation of the manually labeled original raw Raman spectra, highlighting the effectiveness of the method in Raman spectra pre-processing.</p>","PeriodicalId":94213,"journal":{"name":"Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy","volume":"330 ","pages":"125679"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.saa.2024.125679","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Raman spectroscopy is a powerful and non-invasive analytical method for determining the chemical composition and molecular structure of a wide range of materials, including complex biological tissues. However, the captured signals typically suffer from interferences manifested as noise and baseline, which need to be removed for successful data analysis. Effective baseline correction is critical in quantitative analysis, as it may impact peak signature derivation. Current baseline correction methods can be labor-intensive and may require extensive parameter adjustment depending on the input spectrum characteristics. In contrast, deep learning-based baseline correction models trained across various materials, offer a promising and more versatile alternative. This study reports an approach to manually identify the ground-truth baselines for eight different biological materials through extensively tuning the parameters of three classical baseline correction methods, Modified Multi-Polynomial Fit (Modpoly), Improved Modified Multi-Polynomial Fitting (IModpoly), and Adaptive Iteratively Reweighted Penalized Least Squares (airPLS), and combining the outputs to best fit the training data. We designed a one-dimensional Transformer (1dTrans) tailored to fit Raman spectral data for estimating their baselines, and evaluated its performance against convolutional neural network (CNN), ResUNet, and three aforementioned parametric methods. The 1dTrans model achieved lower mean absolute error (MAE) and spectral angle mapper (SAM) scores when compared to the other methods in both development and evaluation of the manually labeled original raw Raman spectra, highlighting the effectiveness of the method in Raman spectra pre-processing.