Pouyan Rezapoor, Jonathan Pham, Beth Neilsen, Hengjie Liu, Minsong Cao, Yingli Yang, Ke Sheng, Ting Martin Ma, James Lamb, Michael Steinberg, Amar U. Kishan, Zachary Taylor, Dan Ruan
{"title":"一种基于聚类的方法来预测mri引导下前列腺SBRT的泌尿生殖系统毒性。","authors":"Pouyan Rezapoor, Jonathan Pham, Beth Neilsen, Hengjie Liu, Minsong Cao, Yingli Yang, Ke Sheng, Ting Martin Ma, James Lamb, Michael Steinberg, Amar U. Kishan, Zachary Taylor, Dan Ruan","doi":"10.1002/mp.17834","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Background</h3>\n \n <p>It is common in outcome analysis to work with a large set of candidate prognostic features. However, such high-dimensional input and relatively small sample size leads to risk of overfitting, low generalizability, and correlation bias.</p>\n </section>\n \n <section>\n \n <h3> Purpose</h3>\n \n <p>This study addresses the issue of correlation bias mitigation in the context of predicting genitourinary (GU) toxicity in prostate cancer patients underwent MRI-guided stereotactic body radiation therapy (SBRT).</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>Typical dimension reduction or feature selection methods include regularization for sparsity or information criterion. However, when heavy correlation occurs with (subsets of) input features, the assigned weights of correlated features can be diluted to an extent that the corresponding features are no more effective in the prediction, leading to suboptimal feature discovery and prediction. We propose to perform advanced hierarchical clustering and then apply regression modeling to cluster centroids. This approach addresses the challenges posed by high dimensionality and ill-conditioning, and improves accuracy and reliability of the resulting prediction models. Performance of the proposed method was evaluated on typical regression models with intrinsic feature reduction methods, namely Least Absolute Shrinkage and Selection Operator (LASSO) regularized logistic regression (LR), support vector machine (SVM), and decision trees (DT).</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>Extensive experiments show that introducing cluster-based feature compaction and representation improves all regression models under fair hyperparameter tuning conditions. Although LASSO and LR with clustered features had similar performance during training and validation, with LASSO-LR being slightly better, the cluster-based feature method achieved significantly better performance on the test set by achieving 0.91 AUC and 0.86 accuracy, demonstrating its advantage in stability and robustness. The overall best test performance is achieved by combining feature clustering to five representatives with SVM. Additional correlation study identified individual features closely representing the cluster centroids as exposure volume of rectum at 2 Gy rectum, trigone exposure at 2 Gy and 41 Gy, urethra at 42 Gy urethra, and rectal wall at 42 Gy rectal wall. This indicates the importance of hot spot control of urethra, trigone, and rectal wall for toxicity control.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>These findings underscore the superiority of the clustering method in mitigating correlation bias and enhancing predictive model accuracy. The current model also achieves state of the art (SOTA) performance in predicting GU toxicity in MRI-guided prostate SBRT. Correlating dose features to feature cluster centroids reveals the importance of hot spot control on urethra, trigone, and rectal wall to reduce toxicity risk.</p>\n </section>\n </div>","PeriodicalId":18384,"journal":{"name":"Medical physics","volume":"52 6","pages":"5104-5114"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A clustering-based approach to address correlated features in predicting genitourinary toxicity from MRI-guided prostate SBRT\",\"authors\":\"Pouyan Rezapoor, Jonathan Pham, Beth Neilsen, Hengjie Liu, Minsong Cao, Yingli Yang, Ke Sheng, Ting Martin Ma, James Lamb, Michael Steinberg, Amar U. Kishan, Zachary Taylor, Dan Ruan\",\"doi\":\"10.1002/mp.17834\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Background</h3>\\n \\n <p>It is common in outcome analysis to work with a large set of candidate prognostic features. However, such high-dimensional input and relatively small sample size leads to risk of overfitting, low generalizability, and correlation bias.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Purpose</h3>\\n \\n <p>This study addresses the issue of correlation bias mitigation in the context of predicting genitourinary (GU) toxicity in prostate cancer patients underwent MRI-guided stereotactic body radiation therapy (SBRT).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>Typical dimension reduction or feature selection methods include regularization for sparsity or information criterion. However, when heavy correlation occurs with (subsets of) input features, the assigned weights of correlated features can be diluted to an extent that the corresponding features are no more effective in the prediction, leading to suboptimal feature discovery and prediction. We propose to perform advanced hierarchical clustering and then apply regression modeling to cluster centroids. This approach addresses the challenges posed by high dimensionality and ill-conditioning, and improves accuracy and reliability of the resulting prediction models. Performance of the proposed method was evaluated on typical regression models with intrinsic feature reduction methods, namely Least Absolute Shrinkage and Selection Operator (LASSO) regularized logistic regression (LR), support vector machine (SVM), and decision trees (DT).</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>Extensive experiments show that introducing cluster-based feature compaction and representation improves all regression models under fair hyperparameter tuning conditions. Although LASSO and LR with clustered features had similar performance during training and validation, with LASSO-LR being slightly better, the cluster-based feature method achieved significantly better performance on the test set by achieving 0.91 AUC and 0.86 accuracy, demonstrating its advantage in stability and robustness. The overall best test performance is achieved by combining feature clustering to five representatives with SVM. Additional correlation study identified individual features closely representing the cluster centroids as exposure volume of rectum at 2 Gy rectum, trigone exposure at 2 Gy and 41 Gy, urethra at 42 Gy urethra, and rectal wall at 42 Gy rectal wall. This indicates the importance of hot spot control of urethra, trigone, and rectal wall for toxicity control.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>These findings underscore the superiority of the clustering method in mitigating correlation bias and enhancing predictive model accuracy. The current model also achieves state of the art (SOTA) performance in predicting GU toxicity in MRI-guided prostate SBRT. Correlating dose features to feature cluster centroids reveals the importance of hot spot control on urethra, trigone, and rectal wall to reduce toxicity risk.</p>\\n </section>\\n </div>\",\"PeriodicalId\":18384,\"journal\":{\"name\":\"Medical physics\",\"volume\":\"52 6\",\"pages\":\"5104-5114\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medical physics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/mp.17834\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mp.17834","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
A clustering-based approach to address correlated features in predicting genitourinary toxicity from MRI-guided prostate SBRT
Background
It is common in outcome analysis to work with a large set of candidate prognostic features. However, such high-dimensional input and relatively small sample size leads to risk of overfitting, low generalizability, and correlation bias.
Purpose
This study addresses the issue of correlation bias mitigation in the context of predicting genitourinary (GU) toxicity in prostate cancer patients underwent MRI-guided stereotactic body radiation therapy (SBRT).
Methods
Typical dimension reduction or feature selection methods include regularization for sparsity or information criterion. However, when heavy correlation occurs with (subsets of) input features, the assigned weights of correlated features can be diluted to an extent that the corresponding features are no more effective in the prediction, leading to suboptimal feature discovery and prediction. We propose to perform advanced hierarchical clustering and then apply regression modeling to cluster centroids. This approach addresses the challenges posed by high dimensionality and ill-conditioning, and improves accuracy and reliability of the resulting prediction models. Performance of the proposed method was evaluated on typical regression models with intrinsic feature reduction methods, namely Least Absolute Shrinkage and Selection Operator (LASSO) regularized logistic regression (LR), support vector machine (SVM), and decision trees (DT).
Results
Extensive experiments show that introducing cluster-based feature compaction and representation improves all regression models under fair hyperparameter tuning conditions. Although LASSO and LR with clustered features had similar performance during training and validation, with LASSO-LR being slightly better, the cluster-based feature method achieved significantly better performance on the test set by achieving 0.91 AUC and 0.86 accuracy, demonstrating its advantage in stability and robustness. The overall best test performance is achieved by combining feature clustering to five representatives with SVM. Additional correlation study identified individual features closely representing the cluster centroids as exposure volume of rectum at 2 Gy rectum, trigone exposure at 2 Gy and 41 Gy, urethra at 42 Gy urethra, and rectal wall at 42 Gy rectal wall. This indicates the importance of hot spot control of urethra, trigone, and rectal wall for toxicity control.
Conclusions
These findings underscore the superiority of the clustering method in mitigating correlation bias and enhancing predictive model accuracy. The current model also achieves state of the art (SOTA) performance in predicting GU toxicity in MRI-guided prostate SBRT. Correlating dose features to feature cluster centroids reveals the importance of hot spot control on urethra, trigone, and rectal wall to reduce toxicity risk.
期刊介绍:
Medical Physics publishes original, high impact physics, imaging science, and engineering research that advances patient diagnosis and therapy through contributions in 1) Basic science developments with high potential for clinical translation 2) Clinical applications of cutting edge engineering and physics innovations 3) Broadly applicable and innovative clinical physics developments
Medical Physics is a journal of global scope and reach. By publishing in Medical Physics your research will reach an international, multidisciplinary audience including practicing medical physicists as well as physics- and engineering based translational scientists. We work closely with authors of promising articles to improve their quality.