Mariam M. Konaté, Ming-Chung Li, L. McShane, Yingdong Zhao
{"title":"Abstract 160: LASSO-based protein signatures for survival prediction in human cancer cohorts","authors":"Mariam M. Konaté, Ming-Chung Li, L. McShane, Yingdong Zhao","doi":"10.1158/1538-7445.AM2021-160","DOIUrl":null,"url":null,"abstract":"Background: Large-scale multi-omics data characterizing human tumors are increasingly available and can be leveraged to develop a deeper understanding of biological processes and predict clinical outcomes. Reverse-phase protein array (RPPA) is a high-throughput, antibody-based method that provides a more direct assessment of cellular activity compared to DNA and RNA sequencing, which generate data that do not always correlate with protein expression. Multiple studies have demonstrated the prognostic value of RPPA data. Some of these studies have used pathway-driven approaches, relying on prior knowledge from the literature to group proteins into biological pathways, to develop prognostic signatures or predictors of treatment response. Methods: We obtained normalized RPPA data for up to 258 total, cleaved, acetylated, or phosphorylated protein species from The Cancer Proteome Atlas (TCPA). Starting from a published RPPA-based seven-protein signature of receptor tyrosine kinase (RTK) pathway activity in the form of an unweighted sum of the seven protein measurements, shown to have prognostic value in a 445-patient renal clear cell carcinoma cohort (TCGA-KIRC), we demonstrated that strong stratification of patients into high and low risk groups can be achieved by using a statistical approach—LASSO regression—with no a priori biological knowledge, to select from the 233 proteins and optimally combine their RPPA measurements into a weighted risk score. Method performance was assessed using two unbiased approaches: 1) 10 iterations of 3-fold cross-validation for unbiased estimation of hazard ratio and difference in 5-year survival (by Kaplan-Meier method) between predictor-defined high and low risk groups; and 2) a permutation test to evaluate the statistical significance of the cross-validated log-rank statistic. Results: For the first evaluation approach, the median hazard ratio between high and low risk groups across the held-out folds in the cross-validation based on the 7-protein RTK score was 2.4, compared to 3.3 when using the risk score derived by LASSO applied to the training data folds. Furthermore, the median difference in overall survival probability at 5 years based on the LASSO-derived risk score was 32.8%, compared to 25.2% when using the 7-protein RTK score. The permutation test p values were 5.0e-4 for both the RTK pathway-driven and the LASSO data-driven approaches. Finally, we demonstrated the applicability and performance of our approach for overall survival prediction in additional TCGA cohorts; namely, ovarian serous cystadenocarcinoma (TCGA-OVCA), sarcoma (TCGA-SARC), and cutaneous melanoma (TCGA-SKCM). Conclusions: The data-driven nature of our LASSO-based approach makes it versatile and particularly well-suited for the discovery of unexplored protein/disease associations that could aid in therapeutic discovery. Citation Format: Mariam M. Konate, Ming-Chung Li, Lisa McShane, Yingdong Zhao. LASSO-based protein signatures for survival prediction in human cancer cohorts [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 160.","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of bioinformatics and systems biology : Open access","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1158/1538-7445.AM2021-160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Large-scale multi-omics data characterizing human tumors are increasingly available and can be leveraged to develop a deeper understanding of biological processes and predict clinical outcomes. Reverse-phase protein array (RPPA) is a high-throughput, antibody-based method that provides a more direct assessment of cellular activity compared to DNA and RNA sequencing, which generate data that do not always correlate with protein expression. Multiple studies have demonstrated the prognostic value of RPPA data. Some of these studies have used pathway-driven approaches, relying on prior knowledge from the literature to group proteins into biological pathways, to develop prognostic signatures or predictors of treatment response. Methods: We obtained normalized RPPA data for up to 258 total, cleaved, acetylated, or phosphorylated protein species from The Cancer Proteome Atlas (TCPA). Starting from a published RPPA-based seven-protein signature of receptor tyrosine kinase (RTK) pathway activity in the form of an unweighted sum of the seven protein measurements, shown to have prognostic value in a 445-patient renal clear cell carcinoma cohort (TCGA-KIRC), we demonstrated that strong stratification of patients into high and low risk groups can be achieved by using a statistical approach—LASSO regression—with no a priori biological knowledge, to select from the 233 proteins and optimally combine their RPPA measurements into a weighted risk score. Method performance was assessed using two unbiased approaches: 1) 10 iterations of 3-fold cross-validation for unbiased estimation of hazard ratio and difference in 5-year survival (by Kaplan-Meier method) between predictor-defined high and low risk groups; and 2) a permutation test to evaluate the statistical significance of the cross-validated log-rank statistic. Results: For the first evaluation approach, the median hazard ratio between high and low risk groups across the held-out folds in the cross-validation based on the 7-protein RTK score was 2.4, compared to 3.3 when using the risk score derived by LASSO applied to the training data folds. Furthermore, the median difference in overall survival probability at 5 years based on the LASSO-derived risk score was 32.8%, compared to 25.2% when using the 7-protein RTK score. The permutation test p values were 5.0e-4 for both the RTK pathway-driven and the LASSO data-driven approaches. Finally, we demonstrated the applicability and performance of our approach for overall survival prediction in additional TCGA cohorts; namely, ovarian serous cystadenocarcinoma (TCGA-OVCA), sarcoma (TCGA-SARC), and cutaneous melanoma (TCGA-SKCM). Conclusions: The data-driven nature of our LASSO-based approach makes it versatile and particularly well-suited for the discovery of unexplored protein/disease associations that could aid in therapeutic discovery. Citation Format: Mariam M. Konate, Ming-Chung Li, Lisa McShane, Yingdong Zhao. LASSO-based protein signatures for survival prediction in human cancer cohorts [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 160.