Comprehensive and realistic simulation of tumour genomic sequencing data.

IF 3.4 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY
NAR cancer Pub Date : 2023-09-22 eCollection Date: 2023-09-01 DOI:10.1093/narcan/zcad051
Brian O'Sullivan, Cathal Seoighe
{"title":"Comprehensive and realistic simulation of tumour genomic sequencing data.","authors":"Brian O'Sullivan,&nbsp;Cathal Seoighe","doi":"10.1093/narcan/zcad051","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive 'ground truth' data, assessing the accuracy of these methods is challenging. We created a computational framework to simulate tumour and matched normal sequencing data for which the source of all loci that contain non-reference bases is known, based on a phased, personalized genome. Unlike existing methods, we account for sampling errors inherent in the sequencing process. Using this framework, we assess accuracy and biases in inferred mutations and their frequencies in an established somatic mutation calling pipeline. We demonstrate bias in existing methods of mutant allele frequency estimation and show, for the first time, the observed mutation frequency spectrum corresponding to a theoretical model of tumour evolution. We highlight the impact of quality filters on detection sensitivity of clinically actionable variants and provide definitive assessment of false positive and false negative mutation calls. Our simulation framework provides an improved means to assess the accuracy of somatic mutation calling pipelines and a detailed picture of the effects of technical parameters and experimental factors on somatic mutation calling in cancer samples.</p>","PeriodicalId":94149,"journal":{"name":"NAR cancer","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10516706/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR cancer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/narcan/zcad051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/9/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive 'ground truth' data, assessing the accuracy of these methods is challenging. We created a computational framework to simulate tumour and matched normal sequencing data for which the source of all loci that contain non-reference bases is known, based on a phased, personalized genome. Unlike existing methods, we account for sampling errors inherent in the sequencing process. Using this framework, we assess accuracy and biases in inferred mutations and their frequencies in an established somatic mutation calling pipeline. We demonstrate bias in existing methods of mutant allele frequency estimation and show, for the first time, the observed mutation frequency spectrum corresponding to a theoretical model of tumour evolution. We highlight the impact of quality filters on detection sensitivity of clinically actionable variants and provide definitive assessment of false positive and false negative mutation calls. Our simulation framework provides an improved means to assess the accuracy of somatic mutation calling pipelines and a detailed picture of the effects of technical parameters and experimental factors on somatic mutation calling in cancer samples.

Abstract Image

Abstract Image

Abstract Image

肿瘤基因组测序数据的全面而现实的模拟。
准确识别癌症的体细胞突变和等位基因频率具有重要的研究和临床应用价值。已经为此开发了几种计算工具,但在缺乏全面的“地面实况”数据的情况下,评估这些方法的准确性具有挑战性。我们创建了一个计算框架来模拟肿瘤并匹配正常测序数据,基于分阶段的个性化基因组,所有包含非参考碱基的基因座的来源都是已知的。与现有方法不同,我们考虑了测序过程中固有的采样误差。使用这个框架,我们评估了推断突变的准确性和偏差,以及它们在已建立的体细胞突变调用管道中的频率。我们证明了现有的突变等位基因频率估计方法存在偏差,并首次显示了与肿瘤进化理论模型相对应的观察到的突变频谱。我们强调了质量过滤器对临床可操作变异检测灵敏度的影响,并对假阳性和假阴性突变呼叫进行了明确评估。我们的模拟框架提供了一种改进的方法来评估体细胞突变调用管道的准确性,并详细描述了技术参数和实验因素对癌症样本中体细胞突变呼叫的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.90
自引率
0.00%
发文量
0
审稿时长
13 weeks
期刊介绍:
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信