Abstract 160: LASSO-based protein signatures for survival prediction in human cancer cohorts

Mariam M. Konaté, Ming-Chung Li, L. McShane, Yingdong Zhao
{"title":"Abstract 160: LASSO-based protein signatures for survival prediction in human cancer cohorts","authors":"Mariam M. Konaté, Ming-Chung Li, L. McShane, Yingdong Zhao","doi":"10.1158/1538-7445.AM2021-160","DOIUrl":null,"url":null,"abstract":"Background: Large-scale multi-omics data characterizing human tumors are increasingly available and can be leveraged to develop a deeper understanding of biological processes and predict clinical outcomes. Reverse-phase protein array (RPPA) is a high-throughput, antibody-based method that provides a more direct assessment of cellular activity compared to DNA and RNA sequencing, which generate data that do not always correlate with protein expression. Multiple studies have demonstrated the prognostic value of RPPA data. Some of these studies have used pathway-driven approaches, relying on prior knowledge from the literature to group proteins into biological pathways, to develop prognostic signatures or predictors of treatment response. Methods: We obtained normalized RPPA data for up to 258 total, cleaved, acetylated, or phosphorylated protein species from The Cancer Proteome Atlas (TCPA). Starting from a published RPPA-based seven-protein signature of receptor tyrosine kinase (RTK) pathway activity in the form of an unweighted sum of the seven protein measurements, shown to have prognostic value in a 445-patient renal clear cell carcinoma cohort (TCGA-KIRC), we demonstrated that strong stratification of patients into high and low risk groups can be achieved by using a statistical approach—LASSO regression—with no a priori biological knowledge, to select from the 233 proteins and optimally combine their RPPA measurements into a weighted risk score. Method performance was assessed using two unbiased approaches: 1) 10 iterations of 3-fold cross-validation for unbiased estimation of hazard ratio and difference in 5-year survival (by Kaplan-Meier method) between predictor-defined high and low risk groups; and 2) a permutation test to evaluate the statistical significance of the cross-validated log-rank statistic. Results: For the first evaluation approach, the median hazard ratio between high and low risk groups across the held-out folds in the cross-validation based on the 7-protein RTK score was 2.4, compared to 3.3 when using the risk score derived by LASSO applied to the training data folds. Furthermore, the median difference in overall survival probability at 5 years based on the LASSO-derived risk score was 32.8%, compared to 25.2% when using the 7-protein RTK score. The permutation test p values were 5.0e-4 for both the RTK pathway-driven and the LASSO data-driven approaches. Finally, we demonstrated the applicability and performance of our approach for overall survival prediction in additional TCGA cohorts; namely, ovarian serous cystadenocarcinoma (TCGA-OVCA), sarcoma (TCGA-SARC), and cutaneous melanoma (TCGA-SKCM). Conclusions: The data-driven nature of our LASSO-based approach makes it versatile and particularly well-suited for the discovery of unexplored protein/disease associations that could aid in therapeutic discovery. Citation Format: Mariam M. Konate, Ming-Chung Li, Lisa McShane, Yingdong Zhao. LASSO-based protein signatures for survival prediction in human cancer cohorts [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 160.","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of bioinformatics and systems biology : Open access","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1158/1538-7445.AM2021-160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Large-scale multi-omics data characterizing human tumors are increasingly available and can be leveraged to develop a deeper understanding of biological processes and predict clinical outcomes. Reverse-phase protein array (RPPA) is a high-throughput, antibody-based method that provides a more direct assessment of cellular activity compared to DNA and RNA sequencing, which generate data that do not always correlate with protein expression. Multiple studies have demonstrated the prognostic value of RPPA data. Some of these studies have used pathway-driven approaches, relying on prior knowledge from the literature to group proteins into biological pathways, to develop prognostic signatures or predictors of treatment response. Methods: We obtained normalized RPPA data for up to 258 total, cleaved, acetylated, or phosphorylated protein species from The Cancer Proteome Atlas (TCPA). Starting from a published RPPA-based seven-protein signature of receptor tyrosine kinase (RTK) pathway activity in the form of an unweighted sum of the seven protein measurements, shown to have prognostic value in a 445-patient renal clear cell carcinoma cohort (TCGA-KIRC), we demonstrated that strong stratification of patients into high and low risk groups can be achieved by using a statistical approach—LASSO regression—with no a priori biological knowledge, to select from the 233 proteins and optimally combine their RPPA measurements into a weighted risk score. Method performance was assessed using two unbiased approaches: 1) 10 iterations of 3-fold cross-validation for unbiased estimation of hazard ratio and difference in 5-year survival (by Kaplan-Meier method) between predictor-defined high and low risk groups; and 2) a permutation test to evaluate the statistical significance of the cross-validated log-rank statistic. Results: For the first evaluation approach, the median hazard ratio between high and low risk groups across the held-out folds in the cross-validation based on the 7-protein RTK score was 2.4, compared to 3.3 when using the risk score derived by LASSO applied to the training data folds. Furthermore, the median difference in overall survival probability at 5 years based on the LASSO-derived risk score was 32.8%, compared to 25.2% when using the 7-protein RTK score. The permutation test p values were 5.0e-4 for both the RTK pathway-driven and the LASSO data-driven approaches. Finally, we demonstrated the applicability and performance of our approach for overall survival prediction in additional TCGA cohorts; namely, ovarian serous cystadenocarcinoma (TCGA-OVCA), sarcoma (TCGA-SARC), and cutaneous melanoma (TCGA-SKCM). Conclusions: The data-driven nature of our LASSO-based approach makes it versatile and particularly well-suited for the discovery of unexplored protein/disease associations that could aid in therapeutic discovery. Citation Format: Mariam M. Konate, Ming-Chung Li, Lisa McShane, Yingdong Zhao. LASSO-based protein signatures for survival prediction in human cancer cohorts [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 160.
160:基于lasso的蛋白质标记用于人类癌症群体的生存预测
背景:表征人类肿瘤的大规模多组学数据越来越多,可以用来更深入地了解生物过程和预测临床结果。逆相蛋白阵列(RPPA)是一种高通量、基于抗体的方法,与DNA和RNA测序相比,它提供了更直接的细胞活性评估,DNA和RNA测序产生的数据并不总是与蛋白质表达相关。多项研究证实了RPPA数据的预后价值。其中一些研究使用了途径驱动的方法,依靠文献中的先验知识将蛋白质分组为生物学途径,以开发治疗反应的预后特征或预测因子。方法:我们从癌症蛋白质组图谱(TCPA)中获得了258种总、断裂、乙酰化或磷酸化蛋白的标准化RPPA数据。从已发表的基于rpa的受体酪氨酸激酶(RTK)途径活性的7种蛋白标记(以7种蛋白测量值的未加权和的形式)开始,在445例肾透明细胞癌队列(TCGA-KIRC)中显示出预后价值,我们证明可以通过使用统计方法- lasso回归-在没有先验生物学知识的情况下将患者分为高风险和低风险组。从233种蛋白质中进行选择,并将其RPPA测量结果最佳地结合成加权风险评分。采用两种无偏方法评估方法的性能:1)10次3重交叉验证,以无偏估计预测者定义的高风险组和低风险组之间的风险比和5年生存率差异(通过Kaplan-Meier方法);2)用置换检验来评价交叉验证的对数秩统计量的统计显著性。结果:对于第一种评估方法,基于7蛋白RTK评分的交叉验证中,高风险组和低风险组之间的中位风险比为2.4,而使用LASSO导出的风险评分应用于训练数据折叠时为3.3。此外,基于lasso衍生风险评分的5年总生存率的中位数差异为32.8%,而使用7蛋白RTK评分的中位数差异为25.2%。RTK路径驱动和LASSO数据驱动方法的排列检验p值均为5.0 ~ 4。最后,我们证明了我们的方法在其他TCGA队列中用于总生存预测的适用性和性能;即卵巢浆液性囊腺癌(TCGA-OVCA)、肉瘤(TCGA-SARC)和皮肤黑色素瘤(TCGA-SKCM)。结论:我们基于lasso的方法的数据驱动性质使其具有通用性,特别适合于发现未探索的蛋白质/疾病关联,可以帮助发现治疗方法。引用格式:Mariam M. Konate, Ming-Chung Li, Lisa McShane, Yingdong Zhao。基于lasso的蛋白质特征用于人类癌症群体的生存预测[摘要]。见:美国癌症研究协会2021年年会论文集;2021年4月10日至15日和5月17日至21日。费城(PA): AACR;癌症杂志,2021;81(13 -增刊):摘要第160期。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信