Ondřej Vrtělka, Kateřina Králová, Markéta Fousková, Vladimír Setnička
{"title":"Comprehensive assessment of the role of spectral data pre-processing in spectroscopy-based liquid biopsy","authors":"Ondřej Vrtělka, Kateřina Králová, Markéta Fousková, Vladimír Setnička","doi":"10.1016/j.saa.2025.126261","DOIUrl":null,"url":null,"abstract":"<div><div>Spectroscopic data often contain artifacts or noise related to the sample characteristics, instrumental variations, or experimental design flaws. Therefore, classifying the raw data is not recommended and might lead to biased results. Nevertheless, most issues may be addressed through appropriate data pre-processing. Effective pre-processing is particularly crucial in critical applications like liquid biopsy for disease detection, where even minor performance improvements may impact patient outcomes. Unfortunately, there is no consensus regarding optimal pre-processing, complicating cross-study comparisons.</div><div>This study presents a comprehensive evaluation of various pre-processing methods and their combinations to assess their influence on classification results. The goal was to identify whether some pre-processing methods are associated with higher classification outcomes and find an optimal strategy for the given data. Data from Raman optical activity and infrared and Raman spectroscopy were processed, applying tens of thousands of possible pre-processing pipelines. The resulting data were classified using three algorithms to distinguish between subjects with liver cirrhosis and those who had developed hepatocellular carcinoma.</div><div>Results highlighted that some specific pre-processing methods often ranked among the best classification results, such as the Rolling Ball for correcting the baseline of Raman spectra or the Doubly Reweighted Penalized Least Squares and Mixture model in the case of Raman optical activity. On the other hand, the selection of filtering and/or normalization approach usually did not have a significant impact. Nonetheless, the pre-processing of top-scoring pipelines also depended on the classifier utilized. The best pipelines yielded an AUROC of 0.775–0.823, varying with the evaluated spectroscopic data and classifier.</div></div>","PeriodicalId":433,"journal":{"name":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","volume":"339 ","pages":"Article 126261"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386142525005670","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SPECTROSCOPY","Score":null,"Total":0}
引用次数: 0
Abstract
Spectroscopic data often contain artifacts or noise related to the sample characteristics, instrumental variations, or experimental design flaws. Therefore, classifying the raw data is not recommended and might lead to biased results. Nevertheless, most issues may be addressed through appropriate data pre-processing. Effective pre-processing is particularly crucial in critical applications like liquid biopsy for disease detection, where even minor performance improvements may impact patient outcomes. Unfortunately, there is no consensus regarding optimal pre-processing, complicating cross-study comparisons.
This study presents a comprehensive evaluation of various pre-processing methods and their combinations to assess their influence on classification results. The goal was to identify whether some pre-processing methods are associated with higher classification outcomes and find an optimal strategy for the given data. Data from Raman optical activity and infrared and Raman spectroscopy were processed, applying tens of thousands of possible pre-processing pipelines. The resulting data were classified using three algorithms to distinguish between subjects with liver cirrhosis and those who had developed hepatocellular carcinoma.
Results highlighted that some specific pre-processing methods often ranked among the best classification results, such as the Rolling Ball for correcting the baseline of Raman spectra or the Doubly Reweighted Penalized Least Squares and Mixture model in the case of Raman optical activity. On the other hand, the selection of filtering and/or normalization approach usually did not have a significant impact. Nonetheless, the pre-processing of top-scoring pipelines also depended on the classifier utilized. The best pipelines yielded an AUROC of 0.775–0.823, varying with the evaluated spectroscopic data and classifier.
期刊介绍:
Spectrochimica Acta, Part A: Molecular and Biomolecular Spectroscopy (SAA) is an interdisciplinary journal which spans from basic to applied aspects of optical spectroscopy in chemistry, medicine, biology, and materials science.
The journal publishes original scientific papers that feature high-quality spectroscopic data and analysis. From the broad range of optical spectroscopies, the emphasis is on electronic, vibrational or rotational spectra of molecules, rather than on spectroscopy based on magnetic moments.
Criteria for publication in SAA are novelty, uniqueness, and outstanding quality. Routine applications of spectroscopic techniques and computational methods are not appropriate.
Topics of particular interest of Spectrochimica Acta Part A include, but are not limited to:
Spectroscopy and dynamics of bioanalytical, biomedical, environmental, and atmospheric sciences,
Novel experimental techniques or instrumentation for molecular spectroscopy,
Novel theoretical and computational methods,
Novel applications in photochemistry and photobiology,
Novel interpretational approaches as well as advances in data analysis based on electronic or vibrational spectroscopy.