A clustering-based approach to address correlated features in predicting genitourinary toxicity from MRI-guided prostate SBRT

IF 3.2 2区 医学 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Medical physics Pub Date : 2025-04-23 DOI:10.1002/mp.17834
Pouyan Rezapoor, Jonathan Pham, Beth Neilsen, Hengjie Liu, Minsong Cao, Yingli Yang, Ke Sheng, Ting Martin Ma, James Lamb, Michael Steinberg, Amar U. Kishan, Zachary Taylor, Dan Ruan
{"title":"A clustering-based approach to address correlated features in predicting genitourinary toxicity from MRI-guided prostate SBRT","authors":"Pouyan Rezapoor,&nbsp;Jonathan Pham,&nbsp;Beth Neilsen,&nbsp;Hengjie Liu,&nbsp;Minsong Cao,&nbsp;Yingli Yang,&nbsp;Ke Sheng,&nbsp;Ting Martin Ma,&nbsp;James Lamb,&nbsp;Michael Steinberg,&nbsp;Amar U. Kishan,&nbsp;Zachary Taylor,&nbsp;Dan Ruan","doi":"10.1002/mp.17834","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>It is common in outcome analysis to work with a large set of candidate prognostic features. However, such high-dimensional input and relatively small sample size leads to risk of overfitting, low generalizability, and correlation bias.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>This study addresses the issue of correlation bias mitigation in the context of predicting genitourinary (GU) toxicity in prostate cancer patients underwent MRI-guided stereotactic body radiation therapy (SBRT).</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Typical dimension reduction or feature selection methods include regularization for sparsity or information criterion. However, when heavy correlation occurs with (subsets of) input features, the assigned weights of correlated features can be diluted to an extent that the corresponding features are no more effective in the prediction, leading to suboptimal feature discovery and prediction. We propose to perform advanced hierarchical clustering and then apply regression modeling to cluster centroids. This approach addresses the challenges posed by high dimensionality and ill-conditioning, and improves accuracy and reliability of the resulting prediction models. Performance of the proposed method was evaluated on typical regression models with intrinsic feature reduction methods, namely Least Absolute Shrinkage and Selection Operator (LASSO) regularized logistic regression (LR), support vector machine (SVM), and decision trees (DT).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Extensive experiments show that introducing cluster-based feature compaction and representation improves all regression models under fair hyperparameter tuning conditions. Although LASSO and LR with clustered features had similar performance during training and validation, with LASSO-LR being slightly better, the cluster-based feature method achieved significantly better performance on the test set by achieving 0.91 AUC and 0.86 accuracy, demonstrating its advantage in stability and robustness. The overall best test performance is achieved by combining feature clustering to five representatives with SVM. Additional correlation study identified individual features closely representing the cluster centroids as exposure volume of rectum at 2 Gy rectum, trigone exposure at 2 Gy and 41 Gy, urethra at 42 Gy urethra, and rectal wall at 42 Gy rectal wall. This indicates the importance of hot spot control of urethra, trigone, and rectal wall for toxicity control.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>These findings underscore the superiority of the clustering method in mitigating correlation bias and enhancing predictive model accuracy. The current model also achieves state of the art (SOTA) performance in predicting GU toxicity in MRI-guided prostate SBRT. Correlating dose features to feature cluster centroids reveals the importance of hot spot control on urethra, trigone, and rectal wall to reduce toxicity risk.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 6","pages":"5104-5114"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mp.17834","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

Abstract

Background

It is common in outcome analysis to work with a large set of candidate prognostic features. However, such high-dimensional input and relatively small sample size leads to risk of overfitting, low generalizability, and correlation bias.

Purpose

This study addresses the issue of correlation bias mitigation in the context of predicting genitourinary (GU) toxicity in prostate cancer patients underwent MRI-guided stereotactic body radiation therapy (SBRT).

Methods

Typical dimension reduction or feature selection methods include regularization for sparsity or information criterion. However, when heavy correlation occurs with (subsets of) input features, the assigned weights of correlated features can be diluted to an extent that the corresponding features are no more effective in the prediction, leading to suboptimal feature discovery and prediction. We propose to perform advanced hierarchical clustering and then apply regression modeling to cluster centroids. This approach addresses the challenges posed by high dimensionality and ill-conditioning, and improves accuracy and reliability of the resulting prediction models. Performance of the proposed method was evaluated on typical regression models with intrinsic feature reduction methods, namely Least Absolute Shrinkage and Selection Operator (LASSO) regularized logistic regression (LR), support vector machine (SVM), and decision trees (DT).

Results

Extensive experiments show that introducing cluster-based feature compaction and representation improves all regression models under fair hyperparameter tuning conditions. Although LASSO and LR with clustered features had similar performance during training and validation, with LASSO-LR being slightly better, the cluster-based feature method achieved significantly better performance on the test set by achieving 0.91 AUC and 0.86 accuracy, demonstrating its advantage in stability and robustness. The overall best test performance is achieved by combining feature clustering to five representatives with SVM. Additional correlation study identified individual features closely representing the cluster centroids as exposure volume of rectum at 2 Gy rectum, trigone exposure at 2 Gy and 41 Gy, urethra at 42 Gy urethra, and rectal wall at 42 Gy rectal wall. This indicates the importance of hot spot control of urethra, trigone, and rectal wall for toxicity control.

Conclusions

These findings underscore the superiority of the clustering method in mitigating correlation bias and enhancing predictive model accuracy. The current model also achieves state of the art (SOTA) performance in predicting GU toxicity in MRI-guided prostate SBRT. Correlating dose features to feature cluster centroids reveals the importance of hot spot control on urethra, trigone, and rectal wall to reduce toxicity risk.

一种基于聚类的方法来预测mri引导下前列腺SBRT的泌尿生殖系统毒性。
背景:在结果分析中,使用大量候选预后特征是很常见的。然而,这种高维输入和相对较小的样本量导致了过度拟合的风险,低泛化性和相关偏差。目的:本研究解决了在预测前列腺癌患者接受mri引导立体定向放射治疗(SBRT)时泌尿生殖系统(GU)毒性的相关偏倚缓解问题。方法:典型的降维或特征选择方法包括稀疏度正则化或信息准则。然而,当与输入特征(子集)发生高度相关性时,相关特征的分配权重可能会被稀释,以至于相应的特征在预测中不再有效,从而导致次优的特征发现和预测。我们提出进行高级分层聚类,然后将回归建模应用于聚类质心。该方法解决了高维和病态条件带来的挑战,提高了预测模型的准确性和可靠性。在典型的基于特征约简方法的回归模型上,即最小绝对收缩和选择算子(LASSO)、正则化逻辑回归(LR)、支持向量机(SVM)和决策树(DT),对所提方法的性能进行了评估。结果:大量的实验表明,在公平的超参数调优条件下,引入基于聚类的特征压缩和表示可以改善所有回归模型。虽然在训练和验证过程中LASSO和带聚类特征的LR性能相似,但LASSO-LR略好,但基于聚类的特征方法在测试集上的性能明显更好,AUC为0.91,准确率为0.86,显示出其稳定性和鲁棒性的优势。将特征聚类与支持向量机相结合,获得了最佳的整体测试性能。另一项相关研究确定了与聚类质心密切相关的个体特征,如2 Gy直肠直肠暴露量、2 Gy和41 Gy三角区暴露量、42 Gy尿道暴露量和42 Gy直肠壁暴露量。这表明尿道、三角区和直肠壁的热点控制对毒性控制的重要性。结论:这些发现强调了聚类方法在减轻相关偏差和提高预测模型准确性方面的优势。目前的模型在预测mri引导前列腺SBRT的GU毒性方面也达到了最先进的水平(SOTA)。剂量特征与特征簇质心的相关性揭示了控制尿道、三角区和直肠壁的热点对降低毒性风险的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Medical physics
Medical physics 医学-核医学
CiteScore
6.80
自引率
15.80%
发文量
660
审稿时长
1.7 months
期刊介绍: Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信