Comprehensive assessment of the role of spectral data pre-processing in spectroscopy-based liquid biopsy

IF 4.3 2区 化学 Q1 SPECTROSCOPY
Ondřej Vrtělka, Kateřina Králová, Markéta Fousková, Vladimír Setnička
{"title":"Comprehensive assessment of the role of spectral data pre-processing in spectroscopy-based liquid biopsy","authors":"Ondřej Vrtělka,&nbsp;Kateřina Králová,&nbsp;Markéta Fousková,&nbsp;Vladimír Setnička","doi":"10.1016/j.saa.2025.126261","DOIUrl":null,"url":null,"abstract":"<div><div>Spectroscopic data often contain artifacts or noise related to the sample characteristics, instrumental variations, or experimental design flaws. Therefore, classifying the raw data is not recommended and might lead to biased results. Nevertheless, most issues may be addressed through appropriate data pre-processing. Effective pre-processing is particularly crucial in critical applications like liquid biopsy for disease detection, where even minor performance improvements may impact patient outcomes. Unfortunately, there is no consensus regarding optimal pre-processing, complicating cross-study comparisons.</div><div>This study presents a comprehensive evaluation of various pre-processing methods and their combinations to assess their influence on classification results. The goal was to identify whether some pre-processing methods are associated with higher classification outcomes and find an optimal strategy for the given data. Data from Raman optical activity and infrared and Raman spectroscopy were processed, applying tens of thousands of possible pre-processing pipelines. The resulting data were classified using three algorithms to distinguish between subjects with liver cirrhosis and those who had developed hepatocellular carcinoma.</div><div>Results highlighted that some specific pre-processing methods often ranked among the best classification results, such as the Rolling Ball for correcting the baseline of Raman spectra or the Doubly Reweighted Penalized Least Squares and Mixture model in the case of Raman optical activity. On the other hand, the selection of filtering and/or normalization approach usually did not have a significant impact. Nonetheless, the pre-processing of top-scoring pipelines also depended on the classifier utilized. The best pipelines yielded an AUROC of 0.775–0.823, varying with the evaluated spectroscopic data and classifier.</div></div>","PeriodicalId":433,"journal":{"name":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","volume":"339 ","pages":"Article 126261"},"PeriodicalIF":4.3000,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386142525005670","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SPECTROSCOPY","Score":null,"Total":0}
引用次数: 0

Abstract

Spectroscopic data often contain artifacts or noise related to the sample characteristics, instrumental variations, or experimental design flaws. Therefore, classifying the raw data is not recommended and might lead to biased results. Nevertheless, most issues may be addressed through appropriate data pre-processing. Effective pre-processing is particularly crucial in critical applications like liquid biopsy for disease detection, where even minor performance improvements may impact patient outcomes. Unfortunately, there is no consensus regarding optimal pre-processing, complicating cross-study comparisons.
This study presents a comprehensive evaluation of various pre-processing methods and their combinations to assess their influence on classification results. The goal was to identify whether some pre-processing methods are associated with higher classification outcomes and find an optimal strategy for the given data. Data from Raman optical activity and infrared and Raman spectroscopy were processed, applying tens of thousands of possible pre-processing pipelines. The resulting data were classified using three algorithms to distinguish between subjects with liver cirrhosis and those who had developed hepatocellular carcinoma.
Results highlighted that some specific pre-processing methods often ranked among the best classification results, such as the Rolling Ball for correcting the baseline of Raman spectra or the Doubly Reweighted Penalized Least Squares and Mixture model in the case of Raman optical activity. On the other hand, the selection of filtering and/or normalization approach usually did not have a significant impact. Nonetheless, the pre-processing of top-scoring pipelines also depended on the classifier utilized. The best pipelines yielded an AUROC of 0.775–0.823, varying with the evaluated spectroscopic data and classifier.

Abstract Image

光谱数据预处理在基于光谱的液体活检中的作用的综合评估
光谱数据通常包含与样品特性、仪器变化或实验设计缺陷相关的伪影或噪声。因此,不建议对原始数据进行分类,这可能会导致有偏差的结果。然而,大多数问题可以通过适当的数据预处理来解决。有效的预处理在关键应用中尤其重要,例如用于疾病检测的液体活检,即使是微小的性能改进也可能影响患者的结果。不幸的是,关于最佳预处理没有达成共识,使交叉研究比较复杂化。本研究对各种预处理方法及其组合进行综合评价,以评估其对分类结果的影响。目标是确定某些预处理方法是否与更高的分类结果相关联,并为给定数据找到最佳策略。对来自拉曼光学活性、红外和拉曼光谱的数据进行了处理,应用了数万种可能的预处理管道。结果数据使用三种算法进行分类,以区分肝硬化受试者和发展为肝细胞癌的受试者。结果表明,某些特定的预处理方法通常在分类结果中名列最佳,例如用于校正拉曼光谱基线的滚动球或用于拉曼光学活性的双重重加权惩罚最小二乘和混合模型。另一方面,滤波和/或归一化方法的选择通常不会产生重大影响。尽管如此,高分管道的预处理也取决于所使用的分类器。最佳管道的AUROC为0.775-0.823,随评估的光谱数据和分类器而变化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.40
自引率
11.40%
发文量
1364
审稿时长
40 days
期刊介绍: Spectrochimica Acta, Part A: Molecular and Biomolecular Spectroscopy (SAA) is an interdisciplinary journal which spans from basic to applied aspects of optical spectroscopy in chemistry, medicine, biology, and materials science. The journal publishes original scientific papers that feature high-quality spectroscopic data and analysis. From the broad range of optical spectroscopies, the emphasis is on electronic, vibrational or rotational spectra of molecules, rather than on spectroscopy based on magnetic moments. Criteria for publication in SAA are novelty, uniqueness, and outstanding quality. Routine applications of spectroscopic techniques and computational methods are not appropriate. Topics of particular interest of Spectrochimica Acta Part A include, but are not limited to: Spectroscopy and dynamics of bioanalytical, biomedical, environmental, and atmospheric sciences, Novel experimental techniques or instrumentation for molecular spectroscopy, Novel theoretical and computational methods, Novel applications in photochemistry and photobiology, Novel interpretational approaches as well as advances in data analysis based on electronic or vibrational spectroscopy.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信