Seo Hee Choi , Euidam Kim , Seok-Jae Heo , Mi Youn Seol , Yoonsun Chung , Hong In Yoon
{"title":"Integrative prediction model for radiation pneumonitis incorporating genetic and clinical-pathological factors using machine learning","authors":"Seo Hee Choi , Euidam Kim , Seok-Jae Heo , Mi Youn Seol , Yoonsun Chung , Hong In Yoon","doi":"10.1016/j.ctro.2024.100819","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><p>We aimed to develop a machine learning-based prediction model for severe radiation pneumonitis (RP) by integrating relevant clinicopathological and genetic factors, considering the associations of clinical, dosimetric parameters, and single nucleotide polymorphisms (SNPs) of genes in the TGF-β1 pathway with RP.</p></div><div><h3>Methods</h3><p>We prospectively enrolled 59 primary lung cancer patients undergoing radiotherapy and analyzed pretreatment blood samples, clinicopathological/dosimetric variables, and 11 functional SNPs in TGFβ pathway genes. Using the Synthetic Minority Over-sampling Technique (SMOTE) and nested cross-validation, we developed a machine learning-based prediction model for severe RP (grade ≥ 2). Feature selection was conducted using four methods (filtered-based, wrapper-based, embedded, and logistic regression), and performance was evaluated using three machine learning models.</p></div><div><h3>Results</h3><p>Severe RP occurred in 20.3 % of patients with a median follow-up of 39.7 months. In our final model, age (>66 years), smoking history, PTV volume (>300 cc), and AG/GG genotype in BMP2 rs1979855 were identified as the most significant predictors. Additionally, incorporating genomic variables for prediction alongside clinicopathological variables significantly improved the AUC compared to using clinicopathological variables alone (0.822 vs. 0.741, p = 0.029). The same feature set was selected using both the wrapper-based method and logistic model, demonstrating the best performance across all machine learning models (AUC: XGBoost 0.815, RF 0.805, SVM 0.712, respectively).</p></div><div><h3>Conclusion</h3><p>We successfully developed a machine learning-based prediction model for RP, demonstrating age, smoking history, PTV volume, and BMP2 rs1979855 genotype as significant predictors. Notably, incorporating SNP data significantly enhanced predictive performance compared to clinicopathological factors alone.</p></div>","PeriodicalId":10342,"journal":{"name":"Clinical and Translational Radiation Oncology","volume":"48 ","pages":"Article 100819"},"PeriodicalIF":2.7000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S240563082400096X/pdfft?md5=7853d13bbac25b9b535a0aab602f6093&pid=1-s2.0-S240563082400096X-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Clinical and Translational Radiation Oncology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S240563082400096X","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
We aimed to develop a machine learning-based prediction model for severe radiation pneumonitis (RP) by integrating relevant clinicopathological and genetic factors, considering the associations of clinical, dosimetric parameters, and single nucleotide polymorphisms (SNPs) of genes in the TGF-β1 pathway with RP.
Methods
We prospectively enrolled 59 primary lung cancer patients undergoing radiotherapy and analyzed pretreatment blood samples, clinicopathological/dosimetric variables, and 11 functional SNPs in TGFβ pathway genes. Using the Synthetic Minority Over-sampling Technique (SMOTE) and nested cross-validation, we developed a machine learning-based prediction model for severe RP (grade ≥ 2). Feature selection was conducted using four methods (filtered-based, wrapper-based, embedded, and logistic regression), and performance was evaluated using three machine learning models.
Results
Severe RP occurred in 20.3 % of patients with a median follow-up of 39.7 months. In our final model, age (>66 years), smoking history, PTV volume (>300 cc), and AG/GG genotype in BMP2 rs1979855 were identified as the most significant predictors. Additionally, incorporating genomic variables for prediction alongside clinicopathological variables significantly improved the AUC compared to using clinicopathological variables alone (0.822 vs. 0.741, p = 0.029). The same feature set was selected using both the wrapper-based method and logistic model, demonstrating the best performance across all machine learning models (AUC: XGBoost 0.815, RF 0.805, SVM 0.712, respectively).
Conclusion
We successfully developed a machine learning-based prediction model for RP, demonstrating age, smoking history, PTV volume, and BMP2 rs1979855 genotype as significant predictors. Notably, incorporating SNP data significantly enhanced predictive performance compared to clinicopathological factors alone.