Development and validation of an interpretable machine learning model for predicting Philadelphia chromosome-positive acute lymphoblastic leukaemia using clinical and laboratory parameters: a single-centre retrospective study.

IF 2.4 3区 医学 Q1 MEDICINE, GENERAL & INTERNAL
Wuchen Yang, Jingya Liu, Yang Gou, Xingqin Huang, Maoshan Chen, Dezhi Huang, Shengwang Wu, Jing Zhang, Cheng Zhang, Shuiqing Liu, Xiangui Peng, Xi Zhang
{"title":"Development and validation of an interpretable machine learning model for predicting Philadelphia chromosome-positive acute lymphoblastic leukaemia using clinical and laboratory parameters: a single-centre retrospective study.","authors":"Wuchen Yang, Jingya Liu, Yang Gou, Xingqin Huang, Maoshan Chen, Dezhi Huang, Shengwang Wu, Jing Zhang, Cheng Zhang, Shuiqing Liu, Xiangui Peng, Xi Zhang","doi":"10.1136/bmjopen-2024-097526","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>To develop and validate a prediction model of Philadelphia chromosome-positive acute lymphoblastic leukaemia (Ph+ALL).</p><p><strong>Design: </strong>A single-centre retrospective study.</p><p><strong>Participants: </strong>This study analysed 471 newly diagnosed patients with ALL at the Second Affiliated Hospital of Army Medical University from January 2014 to December 2023.</p><p><strong>Methods: </strong>Clinical and laboratory parameters were collected, and the important characteristic parameters were selected using BorutaShap. Multiple machine learning (ML) models were constructed and optimised by using the active learning (AL) algorithm. Performance was evaluated using the area under the curve (AUC), comprehensive indicators and decision curve analysis. The interpretability of the model was evaluated by using SHapley Additive Interpretation (SHAP), and external validation was conducted on an independent test cohort.</p><p><strong>Results: </strong>10 parameters were selected to construct multiple ML models. The CatBoost model integrated with an AL algorithm (CatBoost-AL) was found to be the most effective model for predicting Ph+ALL within the validation data set. This model achieved an AUC of 0.797 (95% CI 0.710 to 0.884), along with sensitivity, specificity and F1 score of 0.667, 0.864 and 0.777, respectively. The prediction performance of CatBoost-AL was further validated with an external testing set, where it maintained a strong AUC of 0.794 (95% CI 0.707 to 0.881). Using SHAP for global interpretability analysis, age, monocyte count, γ-glutamyl transferase, neutrophil count and alanine aminotransferase were identified as crucial parameters that significantly influence the diagnostic accuracy of CatBoost-AL.</p><p><strong>Conclusion: </strong>An interpretable ML model and online prediction tool were developed to determine whether newly diagnosed patients with ALL are Ph+ALL. The key parameters identified by the optimal model provided a further understanding of Ph+ALL characteristics and were valuable for accurate diagnosis and treatment of Ph+ALL.</p>","PeriodicalId":9158,"journal":{"name":"BMJ Open","volume":"15 6","pages":"e097526"},"PeriodicalIF":2.4000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Open","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjopen-2024-097526","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: To develop and validate a prediction model of Philadelphia chromosome-positive acute lymphoblastic leukaemia (Ph+ALL).

Design: A single-centre retrospective study.

Participants: This study analysed 471 newly diagnosed patients with ALL at the Second Affiliated Hospital of Army Medical University from January 2014 to December 2023.

Methods: Clinical and laboratory parameters were collected, and the important characteristic parameters were selected using BorutaShap. Multiple machine learning (ML) models were constructed and optimised by using the active learning (AL) algorithm. Performance was evaluated using the area under the curve (AUC), comprehensive indicators and decision curve analysis. The interpretability of the model was evaluated by using SHapley Additive Interpretation (SHAP), and external validation was conducted on an independent test cohort.

Results: 10 parameters were selected to construct multiple ML models. The CatBoost model integrated with an AL algorithm (CatBoost-AL) was found to be the most effective model for predicting Ph+ALL within the validation data set. This model achieved an AUC of 0.797 (95% CI 0.710 to 0.884), along with sensitivity, specificity and F1 score of 0.667, 0.864 and 0.777, respectively. The prediction performance of CatBoost-AL was further validated with an external testing set, where it maintained a strong AUC of 0.794 (95% CI 0.707 to 0.881). Using SHAP for global interpretability analysis, age, monocyte count, γ-glutamyl transferase, neutrophil count and alanine aminotransferase were identified as crucial parameters that significantly influence the diagnostic accuracy of CatBoost-AL.

Conclusion: An interpretable ML model and online prediction tool were developed to determine whether newly diagnosed patients with ALL are Ph+ALL. The key parameters identified by the optimal model provided a further understanding of Ph+ALL characteristics and were valuable for accurate diagnosis and treatment of Ph+ALL.

利用临床和实验室参数预测费城染色体阳性急性淋巴细胞白血病的可解释机器学习模型的开发和验证:一项单中心回顾性研究。
目的:建立并验证费城染色体阳性急性淋巴细胞白血病(Ph+ALL)的预测模型。设计:单中心回顾性研究。参与者:本研究分析了2014年1月至2023年12月在陆军医科大学第二附属医院新诊断的471例ALL患者。方法:收集临床和实验室参数,并采用BorutaShap筛选重要特征参数。利用主动学习(AL)算法构建并优化了多个机器学习(ML)模型。采用曲线下面积(AUC)、综合指标和决策曲线分析法对绩效进行评价。采用SHapley加性解释(SHAP)评价模型的可解释性,并在独立的测试队列中进行外部验证。结果:选取10个参数构建多个ML模型。结合人工智能算法(CatBoost-AL)的CatBoost模型是预测验证数据集中Ph+ALL最有效的模型。该模型的AUC为0.797 (95% CI 0.710 ~ 0.884),敏感性、特异性和F1评分分别为0.667、0.864和0.777。CatBoost-AL的预测性能通过外部测试集进一步验证,其AUC保持在0.794 (95% CI 0.707至0.881)。使用SHAP进行全球可解释性分析,年龄、单核细胞计数、γ-谷氨酰转移酶、中性粒细胞计数和丙氨酸转氨酶被确定为显著影响CatBoost-AL诊断准确性的关键参数。结论:建立了一种可解释的ML模型和在线预测工具,以确定新诊断的ALL患者是否为Ph+ALL。通过优化模型确定的关键参数有助于进一步了解Ph+ALL的特征,对Ph+ALL的准确诊断和治疗具有重要价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMJ Open
BMJ Open MEDICINE, GENERAL & INTERNAL-
CiteScore
4.40
自引率
3.40%
发文量
4510
审稿时长
2-3 weeks
期刊介绍: BMJ Open is an online, open access journal, dedicated to publishing medical research from all disciplines and therapeutic areas. The journal publishes all research study types, from study protocols to phase I trials to meta-analyses, including small or specialist studies. Publishing procedures are built around fully open peer review and continuous publication, publishing research online as soon as the article is ready.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信