全基因组变异数据集，用于18种不同癌症的丰富研究。

Onco Pub Date : 2022-06-01 Epub Date: 2022-06-17 DOI:10.3390/onco2020009

John Torcivia, Kawther Abdilleh, Fabian Seidl, Owais Shahzada, Rebecca Rodriguez, David Pot, Raja Mazumder

{"title":"全基因组变异数据集，用于18种不同癌症的丰富研究。","authors":"John Torcivia, Kawther Abdilleh, Fabian Seidl, Owais Shahzada, Rebecca Rodriguez, David Pot, Raja Mazumder","doi":"10.3390/onco2020009","DOIUrl":null,"url":null,"abstract":"Whole genome sequencing (WGS) has helped to revolutionize biology, but the computational challenge remains for extracting valuable inferences from this information. Here, we present the cancer-associated variants from the Cancer Genome Atlas (TCGA) WGS dataset. This set of data will allow cancer researchers to further expand their analysis beyond the exomic regions of the genome to the entire genome. A total of 1342 WGS alignments available from the consortium were processed with VarScan2 and deposited to the NCI Cancer Cloud. The sample set covers 18 different cancers and reveals 157,313,519 pooled (non-unique) cancer-associated single-nucleotide variations (SNVs) across all samples. There was an average of 117,223 SNVs per sample, with a range from 1111 to 775,470 and a standard deviation of 163,273. The dataset was incorporated into BigQuery, which allows for fast access and cross-mapping, which will allow researchers to enrich their current studies with a plethora of newly available genomic data.","PeriodicalId":74339,"journal":{"name":"Onco","volume":"2 2","pages":"129-144"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10571071/pdf/","citationCount":"0","resultStr":"{\"title\":\"Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers.\",\"authors\":\"John Torcivia, Kawther Abdilleh, Fabian Seidl, Owais Shahzada, Rebecca Rodriguez, David Pot, Raja Mazumder\",\"doi\":\"10.3390/onco2020009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Whole genome sequencing (WGS) has helped to revolutionize biology, but the computational challenge remains for extracting valuable inferences from this information. Here, we present the cancer-associated variants from the Cancer Genome Atlas (TCGA) WGS dataset. This set of data will allow cancer researchers to further expand their analysis beyond the exomic regions of the genome to the entire genome. A total of 1342 WGS alignments available from the consortium were processed with VarScan2 and deposited to the NCI Cancer Cloud. The sample set covers 18 different cancers and reveals 157,313,519 pooled (non-unique) cancer-associated single-nucleotide variations (SNVs) across all samples. There was an average of 117,223 SNVs per sample, with a range from 1111 to 775,470 and a standard deviation of 163,273. The dataset was incorporated into BigQuery, which allows for fast access and cross-mapping, which will allow researchers to enrich their current studies with a plethora of newly available genomic data.\",\"PeriodicalId\":74339,\"journal\":{\"name\":\"Onco\",\"volume\":\"2 2\",\"pages\":\"129-144\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10571071/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Onco\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/onco2020009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/6/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Onco","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/onco2020009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/6/17 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

全基因组测序（WGS）有助于彻底改变生物学，但从这些信息中提取有价值的推论仍然是计算上的挑战。在此，我们展示了癌症基因组图谱（TCGA）WGS数据集中的癌症相关变体。这组数据将使癌症研究人员能够进一步将他们的分析范围从基因组的外显区扩展到整个基因组。使用VarScan2处理来自联合体的总共1342个WGS比对，并将其存入NCI癌症云。样本集涵盖了18种不同的癌症，并揭示了所有样本中157313519种合并的（非唯一的）癌症相关单核苷酸变异（SNV）。每个样本平均有117223个SNV，范围从1111到775470，标准偏差为163273。该数据集被整合到BigQuery中，它允许快速访问和交叉映射，这将使研究人员能够用大量新可用的基因组数据丰富他们当前的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers.

查看原文本刊更多论文

Whole Genome Variant Dataset for Enriching Studies across 18 Different Cancers.

Whole genome sequencing (WGS) has helped to revolutionize biology, but the computational challenge remains for extracting valuable inferences from this information. Here, we present the cancer-associated variants from the Cancer Genome Atlas (TCGA) WGS dataset. This set of data will allow cancer researchers to further expand their analysis beyond the exomic regions of the genome to the entire genome. A total of 1342 WGS alignments available from the consortium were processed with VarScan2 and deposited to the NCI Cancer Cloud. The sample set covers 18 different cancers and reveals 157,313,519 pooled (non-unique) cancer-associated single-nucleotide variations (SNVs) across all samples. There was an average of 117,223 SNVs per sample, with a range from 1111 to 775,470 and a standard deviation of 163,273. The dataset was incorporated into BigQuery, which allows for fast access and cross-mapping, which will allow researchers to enrich their current studies with a plethora of newly available genomic data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Onco

自引率

0.00%

发文量