survivveai:基于体细胞RNA-Seq表达的癌症患者长期生存预测。

IF 2.5 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Cancer Informatics Pub Date : 2022-10-07 eCollection Date: 2022-01-01 DOI:10.1177/11769351221127875

Omri Nayshool, Nitzan Kol, Elisheva Javaski, Ninette Amariglio, Gideon Rechavi

{"title":"survivveai:基于体细胞RNA-Seq表达的癌症患者长期生存预测。","authors":"Omri Nayshool, Nitzan Kol, Elisheva Javaski, Ninette Amariglio, Gideon Rechavi","doi":"10.1177/11769351221127875","DOIUrl":null,"url":null,"abstract":"Motivation: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validate them using an external dataset.Results: For 16 TCGA cancer type cohorts we have optimized a Random Forest prediction model using parameter grid search followed by a backward feature elimination loop for dimensions reduction. For each feature that was removed, the model was retrained and the area under the curve of the receiver operating characteristic (AUC-ROC) was calculated using test data. Five prediction models gave AUC-ROC bigger than 80%. We used Clinical Proteomic Tumor Analysis Consortium v3 (CPTAC3) data for validation. The most enriched pathways for the top models were those involved in basic functions related to tumorigenesis and organ development. Enrichment for 2 prediction models of the TCGA-KIRP cohort was explored, one with 42 genes (AUC-ROC = 0.86) the other is composed of 300 genes (AUC-ROC = 0.85). The most enriched networks for both models share only 5 network nodes: DMBT1, IL11, HOXB6, TRIB3, PIM1. These genes play a significant role in renal cancer and might be used for prognosis prediction and as candidate therapeutic targets.Availability and implementation: The prediction models were created and tested using Python SciKit-Learn package. They are freely accessible via a friendly web interface we called surviveAI at https://tinyurl.com/surviveai.","PeriodicalId":35418,"journal":{"name":"Cancer Informatics","volume":" ","pages":"11769351221127875"},"PeriodicalIF":2.5000,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7a/c4/10.1177_11769351221127875.PMC9549197.pdf","citationCount":"1","resultStr":"{\"title\":\"SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression.\",\"authors\":\"Omri Nayshool, Nitzan Kol, Elisheva Javaski, Ninette Amariglio, Gideon Rechavi\",\"doi\":\"10.1177/11769351221127875\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validate them using an external dataset.Results: For 16 TCGA cancer type cohorts we have optimized a Random Forest prediction model using parameter grid search followed by a backward feature elimination loop for dimensions reduction. For each feature that was removed, the model was retrained and the area under the curve of the receiver operating characteristic (AUC-ROC) was calculated using test data. Five prediction models gave AUC-ROC bigger than 80%. We used Clinical Proteomic Tumor Analysis Consortium v3 (CPTAC3) data for validation. The most enriched pathways for the top models were those involved in basic functions related to tumorigenesis and organ development. Enrichment for 2 prediction models of the TCGA-KIRP cohort was explored, one with 42 genes (AUC-ROC = 0.86) the other is composed of 300 genes (AUC-ROC = 0.85). The most enriched networks for both models share only 5 network nodes: DMBT1, IL11, HOXB6, TRIB3, PIM1. These genes play a significant role in renal cancer and might be used for prognosis prediction and as candidate therapeutic targets.Availability and implementation: The prediction models were created and tested using Python SciKit-Learn package. They are freely accessible via a friendly web interface we called surviveAI at https://tinyurl.com/surviveai.\",\"PeriodicalId\":35418,\"journal\":{\"name\":\"Cancer Informatics\",\"volume\":\" \",\"pages\":\"11769351221127875\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2022-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/7a/c4/10.1177_11769351221127875.PMC9549197.pdf\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cancer Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/11769351221127875\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/11769351221127875","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 1

摘要

动机:肿瘤预后预测是肿瘤学的主要挑战，对治疗计划至关重要。诸如癌症基因组图谱(TCGA)这样的存储库包含了许多类型癌症的大量数据。我们的目标是使用TCGA数据创建可靠的预测模型，并使用外部数据集验证它们。结果:对于16个TCGA癌症类型队列，我们使用参数网格搜索优化了随机森林预测模型，然后使用向后特征消除循环进行降维。对于去除的每一个特征，对模型进行重新训练，并使用测试数据计算接收者工作特征曲线下面积(AUC-ROC)。5个预测模型的AUC-ROC大于80%。我们使用临床蛋白质组学肿瘤分析联盟v3 (CPTAC3)数据进行验证。最丰富的通路是那些与肿瘤发生和器官发育相关的基本功能。对TCGA-KIRP队列的2个预测模型进行了富集研究，其中一个模型由42个基因组成(AUC-ROC = 0.86)，另一个模型由300个基因组成(AUC-ROC = 0.85)。两种模型最丰富的网络只共享5个网络节点:DMBT1、IL11、HOXB6、TRIB3、PIM1。这些基因在肾癌中起着重要的作用，可能用于预后预测和候选治疗靶点。可用性和实现:使用Python SciKit-Learn包创建和测试预测模型。它们可以通过友好的web界面免费访问，我们在https://tinyurl.com/surviveai上称之为surviveAI。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression.

查看原文本刊更多论文

SurviveAI: Long Term Survival Prediction of Cancer Patients Based on Somatic RNA-Seq Expression.

Motivation: Prediction of cancer outcome is a major challenge in oncology and is essential for treatment planning. Repositories such as The Cancer Genome Atlas (TCGA) contain vast amounts of data for many types of cancers. Our goal was to create reliable prediction models using TCGA data and validate them using an external dataset.

Results: For 16 TCGA cancer type cohorts we have optimized a Random Forest prediction model using parameter grid search followed by a backward feature elimination loop for dimensions reduction. For each feature that was removed, the model was retrained and the area under the curve of the receiver operating characteristic (AUC-ROC) was calculated using test data. Five prediction models gave AUC-ROC bigger than 80%. We used Clinical Proteomic Tumor Analysis Consortium v3 (CPTAC3) data for validation. The most enriched pathways for the top models were those involved in basic functions related to tumorigenesis and organ development. Enrichment for 2 prediction models of the TCGA-KIRP cohort was explored, one with 42 genes (AUC-ROC = 0.86) the other is composed of 300 genes (AUC-ROC = 0.85). The most enriched networks for both models share only 5 network nodes: DMBT1, IL11, HOXB6, TRIB3, PIM1. These genes play a significant role in renal cancer and might be used for prognosis prediction and as candidate therapeutic targets.

Availability and implementation: The prediction models were created and tested using Python SciKit-Learn package. They are freely accessible via a friendly web interface we called surviveAI at https://tinyurl.com/surviveai.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cancer Informatics Medicine-Oncology

CiteScore

3.00

自引率

5.00%

发文量

审稿时长

8 weeks

期刊介绍： The field of cancer research relies on advances in many other disciplines, including omics technology, mass spectrometry, radio imaging, computer science, and biostatistics. Cancer Informatics provides open access to peer-reviewed high-quality manuscripts reporting bioinformatics analysis of molecular genetics and/or clinical data pertaining to cancer, emphasizing the use of machine learning, artificial intelligence, statistical algorithms, advanced imaging techniques, data visualization, and high-throughput technologies. As the leading journal dedicated exclusively to the report of the use of computational methods in cancer research and practice, Cancer Informatics leverages methodological improvements in systems biology, genomics, proteomics, metabolomics, and molecular biochemistry into the fields of cancer detection, treatment, classification, risk-prediction, prevention, outcome, and modeling.