Assessing the validity of driver gene identification tools for targeted genome sequencing data

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances Pub Date : 2024-05-23 DOI:10.1093/bioadv/vbae073

Felipe Rojas-Rodríguez, Marjanka K Schmidt, S. Canisius

{"title":"Assessing the validity of driver gene identification tools for targeted genome sequencing data","authors":"Felipe Rojas-Rodríguez, Marjanka K Schmidt, S. Canisius","doi":"10.1093/bioadv/vbae073","DOIUrl":null,"url":null,"abstract":"\n \n \n Most cancer driver gene identification tools have been developed for whole-exome sequencing data. Targeted sequencing is a popular alternative to whole-exome sequencing for large cancer studies due to its greater depth at a lower cost per tumor. Unlike whole-exome sequencing, targeted sequencing only enables mutation calling for a selected subset of genes. Whether existing driver gene identification tools remain valid in that context has not previously been studied.\n \n \n \n We evaluated the validity of seven popular driver gene identification tools when applied to targeted sequencing data. Based on whole-exome data of 14 different cancer types from TCGA, we constructed matching targeted datasets by keeping only the mutations overlapping with the pan-cancer MSK-IMPACT panel and, in the case of breast cancer, also the breast-cancer-specific B-CAST panel. We then compared the driver gene predictions obtained on whole-exome and targeted mutation data for each of the seven tools. Differences in how the tools model background mutation rates were the most important determinant of their validity on targeted sequencing data. Based on our results, we recommend OncodriveFML, OncodriveCLUSTL, 20/20+, dNdSCv, and ActiveDriver for driver gene identification in targeted sequencing data, whereas MutSigCV and DriverML are best avoided in that context.\n \n \n \n Supplementary data are available at Bioinformatics Advances online.\n","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbae073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Most cancer driver gene identification tools have been developed for whole-exome sequencing data. Targeted sequencing is a popular alternative to whole-exome sequencing for large cancer studies due to its greater depth at a lower cost per tumor. Unlike whole-exome sequencing, targeted sequencing only enables mutation calling for a selected subset of genes. Whether existing driver gene identification tools remain valid in that context has not previously been studied. We evaluated the validity of seven popular driver gene identification tools when applied to targeted sequencing data. Based on whole-exome data of 14 different cancer types from TCGA, we constructed matching targeted datasets by keeping only the mutations overlapping with the pan-cancer MSK-IMPACT panel and, in the case of breast cancer, also the breast-cancer-specific B-CAST panel. We then compared the driver gene predictions obtained on whole-exome and targeted mutation data for each of the seven tools. Differences in how the tools model background mutation rates were the most important determinant of their validity on targeted sequencing data. Based on our results, we recommend OncodriveFML, OncodriveCLUSTL, 20/20+, dNdSCv, and ActiveDriver for driver gene identification in targeted sequencing data, whereas MutSigCV and DriverML are best avoided in that context. Supplementary data are available at Bioinformatics Advances online.

查看原文本刊更多论文

评估靶向基因组测序数据驱动基因识别工具的有效性

大多数癌症驱动基因鉴定工具都是针对全外显子组测序数据开发的。在大型癌症研究中，靶向测序是全外显子组测序的热门替代方案，因为它能以较低的成本对每个肿瘤进行更深入的研究。与全外显子组测序不同，靶向测序只能对选定的基因子集进行突变调用。现有的驱动基因鉴定工具在这种情况下是否仍然有效，以前还没有研究过。我们评估了七种流行的驱动基因鉴定工具在应用于靶向测序数据时的有效性。基于 TCGA 中 14 种不同癌症类型的全外显子组数据，我们构建了匹配的靶向数据集，只保留了与泛癌症 MSK-IMPACT 面板重叠的突变，对于乳腺癌，还保留了乳腺癌特异性 B-CAST 面板。然后，我们比较了七种工具中每种工具在全外显子组和靶向突变数据上获得的驱动基因预测结果。这些工具对背景突变率建模方式的不同是决定它们在靶向测序数据上有效性的最重要因素。基于我们的研究结果，我们推荐OncodriveFML、OncodriveCLUSTL、20/20+、dNdSCv和ActiveDriver用于靶向测序数据中驱动基因的鉴定，而MutSigCV和DriverML在这种情况下最好不要使用。补充数据可在 Bioinformatics Advances 在线查阅。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics advances

CiteScore

1.60

自引率

0.00%

发文量