From prediction to practice: mitigating bias and data shift in machine-learning models for chemotherapy-induced organ dysfunction across unseen cancers.

BMJ oncology Pub Date : 2024-11-02 eCollection Date: 2024-01-01 DOI:10.1136/bmjonc-2024-000430

Matthew Watson, Pinkie Chambers, Luke Steventon, James Harmsworth King, Angelo Ercia, Heather Shaw, Noura Al Moubayed

{"title":"From prediction to practice: mitigating bias and data shift in machine-learning models for chemotherapy-induced organ dysfunction across unseen cancers.","authors":"Matthew Watson, Pinkie Chambers, Luke Steventon, James Harmsworth King, Angelo Ercia, Heather Shaw, Noura Al Moubayed","doi":"10.1136/bmjonc-2024-000430","DOIUrl":null,"url":null,"abstract":"Objectives: Routine monitoring of renal and hepatic function during chemotherapy ensures that treatment-related organ damage has not occurred and clearance of subsequent treatment is not hindered; however, frequency and timing are not optimal. Model bias and data heterogeneity concerns have hampered the ability of machine learning (ML) to be deployed into clinical practice. This study aims to develop models that could support individualised decisions on the timing of renal and hepatic monitoring while exploring the effect of data shift on model performance.Methods and analysis: We used retrospective data from three UK hospitals to develop and validate ML models predicting unacceptable rises in creatinine/bilirubin post cycle 3 for patients undergoing treatment for the following cancers: breast, colorectal, lung, ovarian and diffuse large B-cell lymphoma.Results: We extracted 3614 patients with no missing blood test data across cycles 1-6 of chemotherapy treatment. We improved on previous work by including predictions post cycle 3. Optimised for sensitivity, we achieve F2 scores of 0.7773 (bilirubin) and 0.6893 (creatinine) on unseen data. Performance is consistent on tumour types unseen during training (F2 bilirubin: 0.7423, F2 creatinine: 0.6820).Conclusion: Our technique highlights the effectiveness of ML in clinical settings, demonstrating the potential to improve the delivery of care. Notably, our ML models can generalise to unseen tumour types. We propose gold-standard bias mitigation steps for ML models: evaluation on multisite data, thorough patient population analysis, and both formalised bias measures and model performance comparisons on patient subgroups. We demonstrate that data aggregation techniques have unintended consequences on model bias.","PeriodicalId":72436,"journal":{"name":"BMJ oncology","volume":"3 1","pages":"e000430"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11557724/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1136/bmjonc-2024-000430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: Routine monitoring of renal and hepatic function during chemotherapy ensures that treatment-related organ damage has not occurred and clearance of subsequent treatment is not hindered; however, frequency and timing are not optimal. Model bias and data heterogeneity concerns have hampered the ability of machine learning (ML) to be deployed into clinical practice. This study aims to develop models that could support individualised decisions on the timing of renal and hepatic monitoring while exploring the effect of data shift on model performance.

Methods and analysis: We used retrospective data from three UK hospitals to develop and validate ML models predicting unacceptable rises in creatinine/bilirubin post cycle 3 for patients undergoing treatment for the following cancers: breast, colorectal, lung, ovarian and diffuse large B-cell lymphoma.

Results: We extracted 3614 patients with no missing blood test data across cycles 1-6 of chemotherapy treatment. We improved on previous work by including predictions post cycle 3. Optimised for sensitivity, we achieve F2 scores of 0.7773 (bilirubin) and 0.6893 (creatinine) on unseen data. Performance is consistent on tumour types unseen during training (F2 bilirubin: 0.7423, F2 creatinine: 0.6820).

Conclusion: Our technique highlights the effectiveness of ML in clinical settings, demonstrating the potential to improve the delivery of care. Notably, our ML models can generalise to unseen tumour types. We propose gold-standard bias mitigation steps for ML models: evaluation on multisite data, thorough patient population analysis, and both formalised bias measures and model performance comparisons on patient subgroups. We demonstrate that data aggregation techniques have unintended consequences on model bias.

Abstract Image

查看原文本刊更多论文

从预测到实践：减轻化疗引起的器官功能障碍的机器学习模型中的偏差和数据转移。

目的：化疗期间常规监测肾功能和肝功能，确保治疗相关器官未发生损害，不妨碍后续治疗的清除；然而，频率和时间并不是最佳的。模型偏差和数据异质性问题阻碍了机器学习（ML）应用于临床实践的能力。本研究旨在开发模型，以支持对肾脏和肝脏监测时间的个性化决策，同时探索数据转移对模型性能的影响。方法和分析：我们使用来自英国三家医院的回顾性数据来开发和验证ML模型，以预测接受以下癌症治疗的患者在第3周期后肌酐/胆红素的不可接受的升高：乳腺癌、结直肠癌、肺癌、卵巢癌和弥漫性大b细胞淋巴瘤。结果：我们在化疗的第1-6个周期中提取了3614例无遗漏血液检测数据的患者。我们改进了之前的工作，包括周期3后的预测。对灵敏度进行了优化，我们在未见数据上获得了0.7773（胆红素）和0.6893（肌酐）的F2分数。训练中未见的肿瘤类型表现一致（F2胆红素：0.7423，F2肌酐：0.6820）。结论：我们的技术突出了ML在临床环境中的有效性，展示了改善护理交付的潜力。值得注意的是，我们的机器学习模型可以推广到看不见的肿瘤类型。我们提出了ML模型的金标准偏差缓解步骤：对多站点数据进行评估，对患者群体进行全面分析，并对患者亚组进行正规化的偏差测量和模型性能比较。我们证明了数据聚合技术对模型偏差有意想不到的后果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMJ oncology

自引率

0.00%

发文量