{"title":"Systematic investigation of preprocessing pipeline for MALDI data","authors":"Mou Adhikari , Oleg Ryabchykov , Shuxia Guo , Thomas Bocklitz","doi":"10.1016/j.rechem.2026.103175","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Matrix-assisted laser desorption/ ionization Mass Spectrometry (MALDI-MS) is a powerful tool to detect and characterize biomolecules, making it particularly useful in different fields and applications such as proteomics, clinical diagnostics, and biomarker discovery. MALDI data is commonly contaminated by the artefacts originated from both chemical and electrical noise. Data preprocessing is hence important to remove these artefacts and improve the accuracy and reliability of the subsequent (quantitative and qualitative) analysis. A systematic investigation of different preprocessing steps is necessary to establish an effective preprocessing pipeline.</div></div><div><h3>Results</h3><div>In this study, we systematically investigated the different steps including interpolation, smoothing, baseline correction, peak alignment, and peak binning, along with normalization to establish a preprocessing pipeline of MALDI spectral data. The performance of the preprocessing steps and pipeline was benchmarked by the balanced accuracy of differentiating hepatocellular carcinoma (HCC) and healthy (normal) based on MALDI spectral data of liver tissue samples. The established preprocessing pipeline improved the balanced accuracy from 61.3% to 77.6% under the patient-level cross-validation, and from 92.9% to 94.7% under spectral-level cross-validation.</div></div><div><h3>Significance</h3><div>Our findings demonstrated that the classification performance can be greatly affected by the quality of MALDI data, which can be improved by preprocessing steps. The large improvement from the patient-level validations after preprocessing demonstrated well a satisfying performance of the classification against patient-to-patient variability with the help of our preprocessing pipeline. This study will potentially benefit the MS community.</div></div>","PeriodicalId":420,"journal":{"name":"Results in Chemistry","volume":"24 ","pages":"Article 103175"},"PeriodicalIF":4.2000,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Results in Chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2211715626001487","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/2/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Background
Matrix-assisted laser desorption/ ionization Mass Spectrometry (MALDI-MS) is a powerful tool to detect and characterize biomolecules, making it particularly useful in different fields and applications such as proteomics, clinical diagnostics, and biomarker discovery. MALDI data is commonly contaminated by the artefacts originated from both chemical and electrical noise. Data preprocessing is hence important to remove these artefacts and improve the accuracy and reliability of the subsequent (quantitative and qualitative) analysis. A systematic investigation of different preprocessing steps is necessary to establish an effective preprocessing pipeline.
Results
In this study, we systematically investigated the different steps including interpolation, smoothing, baseline correction, peak alignment, and peak binning, along with normalization to establish a preprocessing pipeline of MALDI spectral data. The performance of the preprocessing steps and pipeline was benchmarked by the balanced accuracy of differentiating hepatocellular carcinoma (HCC) and healthy (normal) based on MALDI spectral data of liver tissue samples. The established preprocessing pipeline improved the balanced accuracy from 61.3% to 77.6% under the patient-level cross-validation, and from 92.9% to 94.7% under spectral-level cross-validation.
Significance
Our findings demonstrated that the classification performance can be greatly affected by the quality of MALDI data, which can be improved by preprocessing steps. The large improvement from the patient-level validations after preprocessing demonstrated well a satisfying performance of the classification against patient-to-patient variability with the help of our preprocessing pipeline. This study will potentially benefit the MS community.