Systematic investigation of preprocessing pipeline for MALDI data

IF 4.2 Q2 CHEMISTRY, MULTIDISCIPLINARY
Results in Chemistry Pub Date : 2026-05-05 Epub Date: 2026-02-26 DOI:10.1016/j.rechem.2026.103175
Mou Adhikari , Oleg Ryabchykov , Shuxia Guo , Thomas Bocklitz
{"title":"Systematic investigation of preprocessing pipeline for MALDI data","authors":"Mou Adhikari ,&nbsp;Oleg Ryabchykov ,&nbsp;Shuxia Guo ,&nbsp;Thomas Bocklitz","doi":"10.1016/j.rechem.2026.103175","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Matrix-assisted laser desorption/ ionization Mass Spectrometry (MALDI-MS) is a powerful tool to detect and characterize biomolecules, making it particularly useful in different fields and applications such as proteomics, clinical diagnostics, and biomarker discovery. MALDI data is commonly contaminated by the artefacts originated from both chemical and electrical noise. Data preprocessing is hence important to remove these artefacts and improve the accuracy and reliability of the subsequent (quantitative and qualitative) analysis. A systematic investigation of different preprocessing steps is necessary to establish an effective preprocessing pipeline.</div></div><div><h3>Results</h3><div>In this study, we systematically investigated the different steps including interpolation, smoothing, baseline correction, peak alignment, and peak binning, along with normalization to establish a preprocessing pipeline of MALDI spectral data. The performance of the preprocessing steps and pipeline was benchmarked by the balanced accuracy of differentiating hepatocellular carcinoma (HCC) and healthy (normal) based on MALDI spectral data of liver tissue samples. The established preprocessing pipeline improved the balanced accuracy from 61.3% to 77.6% under the patient-level cross-validation, and from 92.9% to 94.7% under spectral-level cross-validation.</div></div><div><h3>Significance</h3><div>Our findings demonstrated that the classification performance can be greatly affected by the quality of MALDI data, which can be improved by preprocessing steps. The large improvement from the patient-level validations after preprocessing demonstrated well a satisfying performance of the classification against patient-to-patient variability with the help of our preprocessing pipeline. This study will potentially benefit the MS community.</div></div>","PeriodicalId":420,"journal":{"name":"Results in Chemistry","volume":"24 ","pages":"Article 103175"},"PeriodicalIF":4.2000,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Results in Chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211715626001487","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Background

Matrix-assisted laser desorption/ ionization Mass Spectrometry (MALDI-MS) is a powerful tool to detect and characterize biomolecules, making it particularly useful in different fields and applications such as proteomics, clinical diagnostics, and biomarker discovery. MALDI data is commonly contaminated by the artefacts originated from both chemical and electrical noise. Data preprocessing is hence important to remove these artefacts and improve the accuracy and reliability of the subsequent (quantitative and qualitative) analysis. A systematic investigation of different preprocessing steps is necessary to establish an effective preprocessing pipeline.

Results

In this study, we systematically investigated the different steps including interpolation, smoothing, baseline correction, peak alignment, and peak binning, along with normalization to establish a preprocessing pipeline of MALDI spectral data. The performance of the preprocessing steps and pipeline was benchmarked by the balanced accuracy of differentiating hepatocellular carcinoma (HCC) and healthy (normal) based on MALDI spectral data of liver tissue samples. The established preprocessing pipeline improved the balanced accuracy from 61.3% to 77.6% under the patient-level cross-validation, and from 92.9% to 94.7% under spectral-level cross-validation.

Significance

Our findings demonstrated that the classification performance can be greatly affected by the quality of MALDI data, which can be improved by preprocessing steps. The large improvement from the patient-level validations after preprocessing demonstrated well a satisfying performance of the classification against patient-to-patient variability with the help of our preprocessing pipeline. This study will potentially benefit the MS community.

Abstract Image

MALDI数据预处理流程的系统研究
基质辅助激光解吸/电离质谱法(MALDI-MS)是一种检测和表征生物分子的强大工具,在蛋白质组学、临床诊断和生物标志物发现等不同领域和应用中特别有用。MALDI数据通常受到来自化学和电气噪声的伪影的污染。因此,数据预处理对于去除这些伪影并提高后续(定量和定性)分析的准确性和可靠性非常重要。为了建立有效的预处理流程,有必要对不同的预处理步骤进行系统的研究。结果系统地研究了MALDI光谱数据的插值、平滑、基线校正、峰对准、峰分形、归一化等步骤,建立了MALDI光谱数据的预处理流程。基于肝组织样本的MALDI光谱数据,以区分肝细胞癌(HCC)和健康(正常)的平衡准确性为基准,对预处理步骤和管道的性能进行了基准测试。建立的预处理流程将患者水平交叉验证的平衡正确率从61.3%提高到77.6%,将光谱水平交叉验证的平衡正确率从92.9%提高到94.7%。研究结果表明,MALDI数据的质量对分类性能有很大影响,可以通过预处理步骤加以改善。在预处理流程的帮助下,从预处理后的患者级验证中得到的巨大改进很好地证明了针对患者间可变性的分类的令人满意的性能。这项研究可能会使MS社区受益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Results in Chemistry
Results in Chemistry Chemistry-Chemistry (all)
CiteScore
2.70
自引率
8.70%
发文量
380
审稿时长
56 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书