Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis

IF 6.2 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Genome research Pub Date : 2024-10-15 DOI:10.1101/gr.279449.124

Romain Derelle, Johanna von Wachsmann, Tommi Mäklin, Joel Hellewell, Timothy Russell, Ajit Lalvani, Leonid Chindelevitch, Nicholas J. Croucher, Simon R. Harris, John A. Lees

{"title":"Seamless, rapid, and accurate analyses of outbreak genomic data using split k-mer analysis","authors":"Romain Derelle, Johanna von Wachsmann, Tommi Mäklin, Joel Hellewell, Timothy Russell, Ajit Lalvani, Leonid Chindelevitch, Nicholas J. Croucher, Simon R. Harris, John A. Lees","doi":"10.1101/gr.279449.124","DOIUrl":null,"url":null,"abstract":"Sequence variation observed in populations of pathogens can be used for important public health and evolutionary genomic analyses, especially outbreak analysis and transmission reconstruction. Identifying this variation is typically achieved by aligning sequence reads to a reference genome, but this approach is susceptible to reference biases and requires careful filtering of called genotypes. There is a need for tools that can process this growing volume of bacterial genome data, providing rapid results, but that remain simple so they can be used without highly trained bioinformaticians, expensive data analysis, and long-term storage and processing of large files. Here we describe split <em>k</em>-mer analysis (SKA2), a method that supports both reference-free and reference-based mapping to quickly and accurately genotype populations of bacteria using sequencing reads or genome assemblies. SKA2 is highly accurate for closely related samples, and in outbreak simulations, we show superior variant recall compared with reference-based methods, with no false positives. SKA2 can also accurately map variants to a reference and be used with recombination detection methods to rapidly reconstruct vertical evolutionary history. SKA2 is many times faster than comparable methods and can be used to add new genomes to an existing call set, allowing sequential use without the need to reanalyze entire collections. With an inherent absence of reference bias, high accuracy, and a robust implementation, SKA2 has the potential to become the tool of choice for genotyping bacteria. SKA2 is implemented in Rust and is freely available as open-source software.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"78 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.279449.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Sequence variation observed in populations of pathogens can be used for important public health and evolutionary genomic analyses, especially outbreak analysis and transmission reconstruction. Identifying this variation is typically achieved by aligning sequence reads to a reference genome, but this approach is susceptible to reference biases and requires careful filtering of called genotypes. There is a need for tools that can process this growing volume of bacterial genome data, providing rapid results, but that remain simple so they can be used without highly trained bioinformaticians, expensive data analysis, and long-term storage and processing of large files. Here we describe split k-mer analysis (SKA2), a method that supports both reference-free and reference-based mapping to quickly and accurately genotype populations of bacteria using sequencing reads or genome assemblies. SKA2 is highly accurate for closely related samples, and in outbreak simulations, we show superior variant recall compared with reference-based methods, with no false positives. SKA2 can also accurately map variants to a reference and be used with recombination detection methods to rapidly reconstruct vertical evolutionary history. SKA2 is many times faster than comparable methods and can be used to add new genomes to an existing call set, allowing sequential use without the need to reanalyze entire collections. With an inherent absence of reference bias, high accuracy, and a robust implementation, SKA2 has the potential to become the tool of choice for genotyping bacteria. SKA2 is implemented in Rust and is freely available as open-source software.

查看原文本刊更多论文

利用分裂 k-mer 分析法无缝、快速、准确地分析疫情基因组数据

在病原体种群中观察到的序列变异可用于重要的公共卫生和进化基因组分析，尤其是疫情分析和传播重建。识别这种变异通常是通过将序列读数与参考基因组进行比对来实现的，但这种方法容易受到参考偏差的影响，而且需要仔细过滤被调用的基因型。我们需要能处理日益增长的细菌基因组数据的工具，这些工具既要能快速提供结果，又要简单易用，无需训练有素的生物信息学家、昂贵的数据分析以及长期存储和处理大量文件。在这里，我们介绍了分裂 k-mer分析（SKA2），这是一种支持无参考文献和基于参考文献制图的方法，可利用测序读数或基因组组装快速准确地对细菌群体进行基因分型。SKA2 对密切相关的样本具有很高的准确性，在疫情模拟中，我们发现与基于参考的方法相比，SKA2 具有更高的变异召回率，而且没有假阳性。SKA2 还能准确地将变异映射到参考文献，并与重组检测方法一起用于快速重建垂直进化史。SKA2 比同类方法快许多倍，可用于将新基因组添加到现有的调用集，从而可连续使用，而无需重新分析整个集合。SKA2 固有的无参考偏差、高准确性和强大的实施能力，有望成为细菌基因分型的首选工具。SKA2 采用 Rust 语言实现，是免费的开源软件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Genome research 生物-生化与分子生物学

CiteScore

12.40

自引率

1.40%

发文量

140

审稿时长

6 months

期刊介绍： Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine. Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies. New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.