Impact of analytical bias on machine learning models for sepsis prediction using laboratory data.

IF 3.8 2区医学 Q1 MEDICAL LABORATORY TECHNOLOGY

Clinical chemistry and laboratory medicine Pub Date : 2025-05-28 DOI:10.1515/cclm-2025-0491

Meryem Rumeysa Yesil, Ilaria Talli, Michela Pelloso, Chiara Cosma, Elisa Pangrazzi, Mario Plebani, Yasemin Ustundag, Andrea Padoan

{"title":"Impact of analytical bias on machine learning models for sepsis prediction using laboratory data.","authors":"Meryem Rumeysa Yesil, Ilaria Talli, Michela Pelloso, Chiara Cosma, Elisa Pangrazzi, Mario Plebani, Yasemin Ustundag, Andrea Padoan","doi":"10.1515/cclm-2025-0491","DOIUrl":null,"url":null,"abstract":"Objectives: Machine learning (ML) models, using laboratory data, support early sepsis prediction. However, analytical bias in laboratory measurements can compromise their performance and validity in real-world settings. We aimed to evaluate how analytically acceptable bias may affect the validity and generalizability of ML models trained on laboratory data.Methods: A support vector machine model (SVM) for sepsis prediction was developed using complete blood count and erythrocyte sedimentation rate data from outpatients (CS, n=104) and patients from acute inflammatory status wards (SS, n=107). Twenty-six combinations were derived by white blood cells (WBC), platelets (PLT), and erythrocyte sedimentation rate (ESR) biases from analytical performance specifications (APS). The diagnostic performances of the 26 conditions tested were compared to the original dataset.Results: SVM performance of the original dataset was AUC 90.6 % [95 %CI: 80.6-98.7 %]. Minimum, desirable and optimum acceptable biases for WBC were 7.7 , 5.1 and 2.6 %, respectively, for PLT were 6.7 , 4.5 and 2.2 %, respectively and for ESR were 31.6 , 21.1 and 10.5 %, respectively. Across all conditions, AUC varied from 89.8 % [95 %CI: 79.0-97.7 %] (for PLT bias -6.7 %), to 89.5 % [95 %CI: 79.1-98.0 %] (for ESR Bias +31.6 %) to 90.4 % [95 %CI: 79.3-98.4 %] (for WBC Bias -5.1 %). Using a combination of biases, the lowest AUC was 87.8 % [95 %CI: 75.9-96.6 %]. No statistically significant differences were observed for AUC (p>0.05).Conclusions: Bias can influence model performance depending on the parameters and their combinations. Developing new validation strategies to assess the impact of analytical bias on laboratory data in ML models could improve their reliability.","PeriodicalId":10390,"journal":{"name":"Clinical chemistry and laboratory medicine","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical chemistry and laboratory medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1515/cclm-2025-0491","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICAL LABORATORY TECHNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: Machine learning (ML) models, using laboratory data, support early sepsis prediction. However, analytical bias in laboratory measurements can compromise their performance and validity in real-world settings. We aimed to evaluate how analytically acceptable bias may affect the validity and generalizability of ML models trained on laboratory data.

Methods: A support vector machine model (SVM) for sepsis prediction was developed using complete blood count and erythrocyte sedimentation rate data from outpatients (CS, n=104) and patients from acute inflammatory status wards (SS, n=107). Twenty-six combinations were derived by white blood cells (WBC), platelets (PLT), and erythrocyte sedimentation rate (ESR) biases from analytical performance specifications (APS). The diagnostic performances of the 26 conditions tested were compared to the original dataset.

Results: SVM performance of the original dataset was AUC 90.6 % [95 %CI: 80.6-98.7 %]. Minimum, desirable and optimum acceptable biases for WBC were 7.7 , 5.1 and 2.6 %, respectively, for PLT were 6.7 , 4.5 and 2.2 %, respectively and for ESR were 31.6 , 21.1 and 10.5 %, respectively. Across all conditions, AUC varied from 89.8 % [95 %CI: 79.0-97.7 %] (for PLT bias -6.7 %), to 89.5 % [95 %CI: 79.1-98.0 %] (for ESR Bias +31.6 %) to 90.4 % [95 %CI: 79.3-98.4 %] (for WBC Bias -5.1 %). Using a combination of biases, the lowest AUC was 87.8 % [95 %CI: 75.9-96.6 %]. No statistically significant differences were observed for AUC (p>0.05).

Conclusions: Bias can influence model performance depending on the parameters and their combinations. Developing new validation strategies to assess the impact of analytical bias on laboratory data in ML models could improve their reliability.

查看原文本刊更多论文

分析偏差对脓毒症实验室数据预测机器学习模型的影响。

目的：机器学习（ML）模型，利用实验室数据，支持早期败血症预测。然而，实验室测量中的分析偏差可能会损害其在现实世界中的性能和有效性。我们的目的是评估分析上可接受的偏差如何影响实验室数据训练的ML模型的有效性和泛化性。方法：利用门诊患者（CS, n=104）和急性炎症病房患者（SS, n=107）的全血细胞计数和红细胞沉降率数据，建立脓毒症预测的支持向量机模型（SVM）。根据分析性能规范（APS）的白细胞（WBC）、血小板（PLT）和红细胞沉降率（ESR）偏差得出26种组合。将测试的26种条件的诊断性能与原始数据集进行比较。结果：原始数据集的SVM性能AUC为90.6 %[95 %CI: 80.6-98.7 %]。WBC的最小、理想和最佳可接受偏差分别为7.7 、5.1和2.6 %，PLT的最小、理想和最佳可接受偏差分别为6.7 、4.5和2.2 %，ESR的最小、理想和最佳可接受偏差分别为31.6 、21.1和10.5 %。在所有条件下,AUC变化从89.8 %(95 %置信区间:79.0—-97.7 %](PLT偏见的-6.7 %),89.5 %(95 %置信区间:79.1—-98.0 %)+ 31.6 (ESR偏见 %)90.4 %(95 %置信区间:79.3—-98.4 %](WBC偏见的-5.1 %)。综合偏倚，最低AUC为87.8 %[95 %CI: 75.9-96.6 %]。AUC差异无统计学意义（p < 0.05）。结论：偏差会影响模型的性能，这取决于参数和它们的组合。开发新的验证策略来评估分析偏差对ML模型中实验室数据的影响，可以提高其可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Clinical chemistry and laboratory medicine 医学-医学实验技术

CiteScore

11.30

自引率

16.20%

发文量

306

审稿时长

3 months

期刊介绍： Clinical Chemistry and Laboratory Medicine (CCLM) publishes articles on novel teaching and training methods applicable to laboratory medicine. CCLM welcomes contributions on the progress in fundamental and applied research and cutting-edge clinical laboratory medicine. It is one of the leading journals in the field, with an impact factor over 3. CCLM is issued monthly, and it is published in print and electronically. CCLM is the official journal of the European Federation of Clinical Chemistry and Laboratory Medicine (EFLM) and publishes regularly EFLM recommendations and news. CCLM is the official journal of the National Societies from Austria (ÖGLMKC); Belgium (RBSLM); Germany (DGKL); Hungary (MLDT); Ireland (ACBI); Italy (SIBioC); Portugal (SPML); and Slovenia (SZKK); and it is affiliated to AACB (Australia) and SFBC (France). Topics: - clinical biochemistry - clinical genomics and molecular biology - clinical haematology and coagulation - clinical immunology and autoimmunity - clinical microbiology - drug monitoring and analysis - evaluation of diagnostic biomarkers - disease-oriented topics (cardiovascular disease, cancer diagnostics, diabetes) - new reagents, instrumentation and technologies - new methodologies - reference materials and methods - reference values and decision limits - quality and safety in laboratory medicine - translational laboratory medicine - clinical metrology Follow @cclm_degruyter on Twitter!