Common laboratory results-based artificial intelligence analysis achieves accurate classification of plasma cell dyscrasias.

IF 2.3 3区生物学 Q2 MULTIDISCIPLINARY SCIENCES

PeerJ Pub Date : 2024-11-04 eCollection Date: 2024-01-01 DOI:10.7717/peerj.18391

Bihua Yao, Yicheng Liu, Yuwei Wu, Siyu Mao, Hangbiao Zhang, Lei Jiang, Cheng Fei, Shuang Wang, Jijun Tong, Jianguo Wu

{"title":"Common laboratory results-based artificial intelligence analysis achieves accurate classification of plasma cell dyscrasias.","authors":"Bihua Yao, Yicheng Liu, Yuwei Wu, Siyu Mao, Hangbiao Zhang, Lei Jiang, Cheng Fei, Shuang Wang, Jijun Tong, Jianguo Wu","doi":"10.7717/peerj.18391","DOIUrl":null,"url":null,"abstract":"Background: Plasma cell dyscrasias encompass a diverse set of disorders, where early and precise diagnosis is essential for optimizing patient outcomes. Despite advancements, current diagnostic methodologies remain underutilized in applying artificial intelligence (AI) to routine laboratory data. This study seeks to construct an AI-driven model leveraging standard laboratory parameters to enhance diagnostic accuracy and classification efficiency in plasma cell dyscrasias.Methods: Data from 1,188 participants (609 with plasma cell dyscrasias and 579 controls) collected between 2018 and 2023 were analyzed. Initial variable selection employed Kruskal-Wallis and Wilcoxon tests, followed by dimensionality reduction and variable prioritization using the Shapley Additive Explanations (SHAP) approach. Nine pivotal variables were identified, including hemoglobin (HGB), serum creatinine, and β2-microglobulin. Utilizing these, four machine learning models (gradient boosting decision tree (GBDT), support vector machine (SVM), deep neural network (DNN), and decision tree (DT) were developed and evaluated, with performance metrics such as accuracy, recall, and area under the curve (AUC) assessed through 5-fold cross-validation. A subtype classification model was also developed, analyzing data from 380 cases to classify disorders such as multiple myeloma (MM) and monoclonal gammopathy of undetermined significance (MGUS).Results: 1. Variable selection: The SHAP method pinpointed nine critical variables, including hemoglobin (HGB), serum creatinine, erythrocyte sedimentation rate (ESR), and β2-microglobulin. 2. Diagnostic model performance: The GBDT model exhibited superior diagnostic performance for plasma cell dyscrasias, achieving 93.5% accuracy, 98.1% recall, and an AUC of 0.987. External validation reinforced its robustness, with 100% accuracy and an F1 score of 98.5%. 3. Subtype Classification: The DNN model excelled in classifying multiple myeloma, MGUS, and light-chain myeloma, demonstrating sensitivity and specificity above 90% across all subtypes.Conclusions: AI models based on routine laboratory results significantly enhance the precision of diagnosing and classifying plasma cell dyscrasias, presenting a promising avenue for early detection and individualized treatment strategies.","PeriodicalId":19799,"journal":{"name":"PeerJ","volume":"12 ","pages":"e18391"},"PeriodicalIF":2.3000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11542560/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.7717/peerj.18391","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Plasma cell dyscrasias encompass a diverse set of disorders, where early and precise diagnosis is essential for optimizing patient outcomes. Despite advancements, current diagnostic methodologies remain underutilized in applying artificial intelligence (AI) to routine laboratory data. This study seeks to construct an AI-driven model leveraging standard laboratory parameters to enhance diagnostic accuracy and classification efficiency in plasma cell dyscrasias.

Methods: Data from 1,188 participants (609 with plasma cell dyscrasias and 579 controls) collected between 2018 and 2023 were analyzed. Initial variable selection employed Kruskal-Wallis and Wilcoxon tests, followed by dimensionality reduction and variable prioritization using the Shapley Additive Explanations (SHAP) approach. Nine pivotal variables were identified, including hemoglobin (HGB), serum creatinine, and β₂-microglobulin. Utilizing these, four machine learning models (gradient boosting decision tree (GBDT), support vector machine (SVM), deep neural network (DNN), and decision tree (DT) were developed and evaluated, with performance metrics such as accuracy, recall, and area under the curve (AUC) assessed through 5-fold cross-validation. A subtype classification model was also developed, analyzing data from 380 cases to classify disorders such as multiple myeloma (MM) and monoclonal gammopathy of undetermined significance (MGUS).

Results: 1. Variable selection: The SHAP method pinpointed nine critical variables, including hemoglobin (HGB), serum creatinine, erythrocyte sedimentation rate (ESR), and β₂-microglobulin. 2. Diagnostic model performance: The GBDT model exhibited superior diagnostic performance for plasma cell dyscrasias, achieving 93.5% accuracy, 98.1% recall, and an AUC of 0.987. External validation reinforced its robustness, with 100% accuracy and an F1 score of 98.5%. 3. Subtype Classification: The DNN model excelled in classifying multiple myeloma, MGUS, and light-chain myeloma, demonstrating sensitivity and specificity above 90% across all subtypes.

Conclusions: AI models based on routine laboratory results significantly enhance the precision of diagnosing and classifying plasma cell dyscrasias, presenting a promising avenue for early detection and individualized treatment strategies.

查看原文本刊更多论文

基于普通实验室结果的人工智能分析实现了浆细胞异常的准确分类。

背景：浆细胞异常包含多种疾病，早期精确诊断对优化患者预后至关重要。尽管取得了进步，但在将人工智能（AI）应用于常规实验室数据方面，目前的诊断方法仍未得到充分利用。本研究旨在利用标准实验室参数构建一个人工智能驱动模型，以提高浆细胞异常的诊断准确性和分类效率：分析了2018年至2023年期间收集的1188名参与者（609名浆细胞异常患者和579名对照者）的数据。最初的变量选择采用了 Kruskal-Wallis 和 Wilcoxon 检验，随后使用 Shapley Additive Explanations（SHAP）方法进行降维和变量优先级排序。最终确定了九个关键变量，包括血红蛋白（HGB）、血清肌酐和β2-微球蛋白。利用这些变量，开发并评估了四种机器学习模型（梯度提升决策树（GBDT）、支持向量机（SVM）、深度神经网络（DNN）和决策树（DT）），并通过 5 倍交叉验证评估了准确率、召回率和曲线下面积（AUC）等性能指标。此外，还开发了一个亚型分类模型，分析了来自 380 个病例的数据，对多发性骨髓瘤（MM）和意义未定的单克隆丙种球蛋白病（MGUS）等疾病进行了分类：1.变量选择：SHAP方法确定了9个关键变量，包括血红蛋白（HGB）、血清肌酐、红细胞沉降率（ESR）和β2-微球蛋白。2.诊断模型的性能：GBDT 模型在诊断浆细胞异常方面表现优异，准确率达 93.5%，召回率达 98.1%，AUC 为 0.987。外部验证加强了其稳健性，准确率达到 100%，F1 得分为 98.5%。3.子类型分类：DNN 模型在多发性骨髓瘤、MGUS 和轻链骨髓瘤的分类中表现出色，在所有亚型中的灵敏度和特异性均超过 90%：基于常规实验室结果的人工智能模型大大提高了浆细胞异常诊断和分类的精确度，为早期检测和个体化治疗策略提供了一个前景广阔的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PeerJ MULTIDISCIPLINARY SCIENCES-

CiteScore

4.70

自引率

3.70%

发文量

1665

审稿时长

10 weeks

期刊介绍： PeerJ is an open access peer-reviewed scientific journal covering research in the biological and medical sciences. At PeerJ, authors take out a lifetime publication plan (for as little as $99) which allows them to publish articles in the journal for free, forever. PeerJ has 5 Nobel Prize Winners on the Board; they have won several industry and media awards; and they are widely recognized as being one of the most interesting recent developments in academic publishing.