Zi-Hao Hu, Ying Wang, Long Yang, Qing-Yi Cao, Ming Ling, Xiao-Hua Meng, Yao Chen, Shu-Jun Ni, Zhi Chen, Cheng-Zhi Liu, Kun-Kai Su
{"title":"Evaluation of 10 Different Pipelines for Bacterial Single-Nucleotide Variant Detection","authors":"Zi-Hao Hu, Ying Wang, Long Yang, Qing-Yi Cao, Ming Ling, Xiao-Hua Meng, Yao Chen, Shu-Jun Ni, Zhi Chen, Cheng-Zhi Liu, Kun-Kai Su","doi":"10.1097/IM9.0000000000000134","DOIUrl":null,"url":null,"abstract":"Abstract Bacterial genome sequencing is a powerful technique for studying the genetic diversity and evolution of microbial populations. However, the detection of genomic variants from sequencing data is challenging due to the presence of contamination, sequencing errors and multiple strains within the same species. Several bioinformatics tools have been developed to address these issues, but their performance and accuracy have not been systematically evaluated. In this study, we compared 10 variant detection pipelines using 18 simulated and 17 real datasets of high-throughput sequences from a bundle of representative bacteria. We assessed the sensitivity of each pipeline under different conditions of coverage, simulation and strain diversity. We also demonstrated the application of these tools to identify consistent mutations in a 30-time repeated sequencing dataset of Staphylococcus hominis. We found that HaplotypeCaller, but not Mutect2, from the GATK tool set showed the best performance in terms of accuracy and robustness. CFSAN and Snippy performed not as well in several simulated and real sequencing datasets. Our results provided a comprehensive benchmark and guidance for choosing the optimal variant detection pipeline for high-throughput bacterial genome sequencing data.","PeriodicalId":73374,"journal":{"name":"Infectious microbes & diseases","volume":null,"pages":null},"PeriodicalIF":2.0000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Infectious microbes & diseases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1097/IM9.0000000000000134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Bacterial genome sequencing is a powerful technique for studying the genetic diversity and evolution of microbial populations. However, the detection of genomic variants from sequencing data is challenging due to the presence of contamination, sequencing errors and multiple strains within the same species. Several bioinformatics tools have been developed to address these issues, but their performance and accuracy have not been systematically evaluated. In this study, we compared 10 variant detection pipelines using 18 simulated and 17 real datasets of high-throughput sequences from a bundle of representative bacteria. We assessed the sensitivity of each pipeline under different conditions of coverage, simulation and strain diversity. We also demonstrated the application of these tools to identify consistent mutations in a 30-time repeated sequencing dataset of Staphylococcus hominis. We found that HaplotypeCaller, but not Mutect2, from the GATK tool set showed the best performance in terms of accuracy and robustness. CFSAN and Snippy performed not as well in several simulated and real sequencing datasets. Our results provided a comprehensive benchmark and guidance for choosing the optimal variant detection pipeline for high-throughput bacterial genome sequencing data.