parallel normal:非匹配序列数据的高效变体调用管道

Laura Follia, Fabio Tordini, S. Pernice, G. Romano, G. Piaggeschi, G. Ferrero
{"title":"parallel normal:非匹配序列数据的高效变体调用管道","authors":"Laura Follia, Fabio Tordini, S. Pernice, G. Romano, G. Piaggeschi, G. Ferrero","doi":"10.1109/PDP2018.2018.00074","DOIUrl":null,"url":null,"abstract":"Nowadays, next generation sequencing is closer to clinical application in the field of oncology. Indeed, it allows the identification of tumor-specific mutations acquired during cancer development, progression and resistance to therapy. In parallel with an evolving sequencing technology, novel computational approaches are needed to cope with the requirement of a rapid processing of sequencing data into a list of clinically-relevant genomic variants. Since sequencing data from both tumors and their matched normal samples are not always available (unmatched data), there is a need of a computational pipeline leading to variants calling in unmatched data. Despite the presence of many accurate and precise variant calling algorithms, an efficient approach is still lacking. Here, we propose a parallel pipeline (ParallNormal) designed to efficiently identify genomic variants from whole- exome sequencing data, in absence of their matched normal. ParallNormal integrates well-known algorithms such as BWA and GATK, a novel tool for duplicate removal (DuplicateRemove), and the FreeBayes variant calling algorithm. A re-engineered implementation of FreeBayes, optimized for execution on modern multi-core architectures is also proposed. ParallNormal was applied on whole-exome sequencing data of pancreatic cancer samples without considering their matched normal. The robustness of ParallNormal was tested using results of the same dataset analyzed using matched normal samples and considering genes involved in pancreatic carcinogenesis. Our pipeline was able to confirm most of the variants identified using matched normal data.","PeriodicalId":333367,"journal":{"name":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ParallNormal: An Efficient Variant Calling Pipeline for Unmatched Sequencing Data\",\"authors\":\"Laura Follia, Fabio Tordini, S. Pernice, G. Romano, G. Piaggeschi, G. Ferrero\",\"doi\":\"10.1109/PDP2018.2018.00074\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, next generation sequencing is closer to clinical application in the field of oncology. Indeed, it allows the identification of tumor-specific mutations acquired during cancer development, progression and resistance to therapy. In parallel with an evolving sequencing technology, novel computational approaches are needed to cope with the requirement of a rapid processing of sequencing data into a list of clinically-relevant genomic variants. Since sequencing data from both tumors and their matched normal samples are not always available (unmatched data), there is a need of a computational pipeline leading to variants calling in unmatched data. Despite the presence of many accurate and precise variant calling algorithms, an efficient approach is still lacking. Here, we propose a parallel pipeline (ParallNormal) designed to efficiently identify genomic variants from whole- exome sequencing data, in absence of their matched normal. ParallNormal integrates well-known algorithms such as BWA and GATK, a novel tool for duplicate removal (DuplicateRemove), and the FreeBayes variant calling algorithm. A re-engineered implementation of FreeBayes, optimized for execution on modern multi-core architectures is also proposed. ParallNormal was applied on whole-exome sequencing data of pancreatic cancer samples without considering their matched normal. The robustness of ParallNormal was tested using results of the same dataset analyzed using matched normal samples and considering genes involved in pancreatic carcinogenesis. Our pipeline was able to confirm most of the variants identified using matched normal data.\",\"PeriodicalId\":333367,\"journal\":{\"name\":\"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP2018.2018.00074\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP2018.2018.00074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目前,下一代测序技术在肿瘤领域的临床应用更加接近。事实上,它可以识别在癌症发生、进展和对治疗的抵抗过程中获得的肿瘤特异性突变。与不断发展的测序技术并行,需要新的计算方法来应对将测序数据快速处理成临床相关基因组变异列表的要求。由于来自肿瘤及其匹配的正常样本的测序数据并不总是可用的(未匹配的数据),因此需要一个计算管道来导致调用未匹配数据的变体。尽管存在许多精确和精确的变量调用算法,但仍然缺乏一种有效的方法。在这里,我们提出了一个平行管道(parallel normal),旨在有效地从全外显子组测序数据中识别基因组变异,没有匹配的正常。parallel normal集成了著名的算法,如BWA和GATK,一种新的重复删除工具(DuplicateRemove),以及FreeBayes变体调用算法。FreeBayes的一个重新设计的实现,优化了在现代多核架构上的执行。parallel normal应用于胰腺癌样本的全外显子组测序数据,而不考虑其匹配正常。通过使用匹配的正常样本分析相同数据集的结果并考虑与胰腺癌发生有关的基因,对parallel normal的稳健性进行了测试。我们的管道能够确认使用匹配的正常数据识别的大多数变体。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ParallNormal: An Efficient Variant Calling Pipeline for Unmatched Sequencing Data
Nowadays, next generation sequencing is closer to clinical application in the field of oncology. Indeed, it allows the identification of tumor-specific mutations acquired during cancer development, progression and resistance to therapy. In parallel with an evolving sequencing technology, novel computational approaches are needed to cope with the requirement of a rapid processing of sequencing data into a list of clinically-relevant genomic variants. Since sequencing data from both tumors and their matched normal samples are not always available (unmatched data), there is a need of a computational pipeline leading to variants calling in unmatched data. Despite the presence of many accurate and precise variant calling algorithms, an efficient approach is still lacking. Here, we propose a parallel pipeline (ParallNormal) designed to efficiently identify genomic variants from whole- exome sequencing data, in absence of their matched normal. ParallNormal integrates well-known algorithms such as BWA and GATK, a novel tool for duplicate removal (DuplicateRemove), and the FreeBayes variant calling algorithm. A re-engineered implementation of FreeBayes, optimized for execution on modern multi-core architectures is also proposed. ParallNormal was applied on whole-exome sequencing data of pancreatic cancer samples without considering their matched normal. The robustness of ParallNormal was tested using results of the same dataset analyzed using matched normal samples and considering genes involved in pancreatic carcinogenesis. Our pipeline was able to confirm most of the variants identified using matched normal data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信