Machine Learning Algorithm to Explore Patients With Heterogeneous Treatment Effects of Clinically Significant CMV Infection and Non-Relapse Mortality After HSCT.
{"title":"Machine Learning Algorithm to Explore Patients With Heterogeneous Treatment Effects of Clinically Significant CMV Infection and Non-Relapse Mortality After HSCT.","authors":"Takashi Toya, Yujiro Nakajima, Konan Hara, Satoshi Kaito, Tetsuya Nishida, Naoyuki Uchida, Naoki Shingai, Wataru Takeda, Yukiyasu Ozawa, Masatsugu Tanaka, Satoshi Yoshihara, Yuta Katayama, Tetsuya Eto, Masashi Sawa, Shuichi Ota, Hiroyuki Ohigashi, Satoru Takada, Keisuke Kataoka, Junya Kanda, Takahiro Fukuda, Masao Ogata, Ayumi Taguchi, Yoshiko Atsuta","doi":"10.1002/jha2.70117","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Clinically significant cytomegalovirus infection (csCMVi) and non-relapse mortality (NRM) remain serious concerns after allogeneic hematopoietic stem cell transplantation (HSCT), but subpopulations with heterogeneous treatment effects (HTEs) is unclear. Although machine learning (ML) algorithms have recently been applied to HSCT, the methodology has not been well elucidated.</p><p><strong>Methods: </strong>We developed a ML algorithm which combined weighting procedures and left-truncated and right-censored trees based on classification and regression tree algorithms to fit survival data with time-varying covariates and competing risks comprehensively. The Japanese large-scale registry data were applied to the algorithm to explore subpopulations with HTEs of csCMVi and NRM after HSCT. Its performance was evaluated by comparing their c-indices with those of the conventional Fine-Gray model.</p><p><strong>Results: </strong>A total of 10,480 patients were divided into training (75%) and test (25%) cohorts; the training cohort was used to develop the ML model. Using the model, patient CMV-seropositivity, patient age, and acute graft-versus-host disease were identified as important predictors of csCMVi. In addition, the patients were successfully classified by the estimated cumulative incidence of csCMVi, which varied from 22.7% at 0.5 year to 82.7%. This model also depicts interpretable survival trees in various settings. Similarly, the patients can be also classified based on the estimated 3-year NRM, which varied from 8.0% to 48.5%. C-indices of the ML and the Fine-Gray model using the test cohort showed comparable performance.</p><p><strong>Conclusion: </strong>A reliable, explainable, and interpretable ML model was developed to explore subpopulations with HTEs of csCMVi and NRM after HSCT. <b>Trial Registration</b>: The authors have confirmed clinical trial registration is not needed for this submission.</p>","PeriodicalId":72883,"journal":{"name":"EJHaem","volume":"6 4","pages":"e70117"},"PeriodicalIF":1.2000,"publicationDate":"2025-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12335206/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"EJHaem","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/jha2.70117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Clinically significant cytomegalovirus infection (csCMVi) and non-relapse mortality (NRM) remain serious concerns after allogeneic hematopoietic stem cell transplantation (HSCT), but subpopulations with heterogeneous treatment effects (HTEs) is unclear. Although machine learning (ML) algorithms have recently been applied to HSCT, the methodology has not been well elucidated.
Methods: We developed a ML algorithm which combined weighting procedures and left-truncated and right-censored trees based on classification and regression tree algorithms to fit survival data with time-varying covariates and competing risks comprehensively. The Japanese large-scale registry data were applied to the algorithm to explore subpopulations with HTEs of csCMVi and NRM after HSCT. Its performance was evaluated by comparing their c-indices with those of the conventional Fine-Gray model.
Results: A total of 10,480 patients were divided into training (75%) and test (25%) cohorts; the training cohort was used to develop the ML model. Using the model, patient CMV-seropositivity, patient age, and acute graft-versus-host disease were identified as important predictors of csCMVi. In addition, the patients were successfully classified by the estimated cumulative incidence of csCMVi, which varied from 22.7% at 0.5 year to 82.7%. This model also depicts interpretable survival trees in various settings. Similarly, the patients can be also classified based on the estimated 3-year NRM, which varied from 8.0% to 48.5%. C-indices of the ML and the Fine-Gray model using the test cohort showed comparable performance.
Conclusion: A reliable, explainable, and interpretable ML model was developed to explore subpopulations with HTEs of csCMVi and NRM after HSCT. Trial Registration: The authors have confirmed clinical trial registration is not needed for this submission.