基于ska （ska）的局部图构造的无引用变量调用。

IF 11 1区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY

Molecular biology and evolution Pub Date : 2025-04-01 DOI:10.1093/molbev/msaf077

Romain Derelle, Kieran Madon, Joel Hellewell, Víctor Rodríguez-Bouza, Nimalan Arinaminpathy, Ajit Lalvani, Nicholas J Croucher, Simon R Harris, John A Lees, Leonid Chindelevitch

{"title":"基于ska （ska）的局部图构造的无引用变量调用。","authors":"Romain Derelle, Kieran Madon, Joel Hellewell, Víctor Rodríguez-Bouza, Nimalan Arinaminpathy, Ajit Lalvani, Nicholas J Croucher, Simon R Harris, John A Lees, Leonid Chindelevitch","doi":"10.1093/molbev/msaf077","DOIUrl":null,"url":null,"abstract":"The study of genomic variants is increasingly important for public health surveillance of pathogens. Traditional variant-calling methods from whole-genome sequencing data rely on reference-based alignment, which can introduce biases and require significant computational resources. Alignment- and reference-free approaches offer an alternative by leveraging k-mer-based methods, but existing implementations often suffer from sensitivity limitations, particularly in high mutation density genomic regions. Here, we present ska lo, a graph-based algorithm that aims to identify within-strain variants in pathogen whole-genome sequencing data by traversing a colored De Bruijn graph and building variant groups (i.e. sets of variant combinations). Through in silico benchmarking and real-world dataset analyses, we demonstrate that ska lo achieves high sensitivity in single-nucleotide polymorphism (SNP) calls while also enabling the detection of insertions and deletions, as well as SNP positioning on a reference genome for recombination analyses. These findings highlight ska lo as a simple, fast, and effective tool for pathogen genomic epidemiology, extending the range of reference-free variant-calling approaches. ska lo is freely available as part of the SKA program (https://github.com/bacpop/ska.rust).","PeriodicalId":18730,"journal":{"name":"Molecular biology and evolution","volume":" ","pages":""},"PeriodicalIF":11.0000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11986325/pdf/","citationCount":"0","resultStr":"{\"title\":\"Reference-Free Variant Calling with Local Graph Construction with ska lo (SKA).\",\"authors\":\"Romain Derelle, Kieran Madon, Joel Hellewell, Víctor Rodríguez-Bouza, Nimalan Arinaminpathy, Ajit Lalvani, Nicholas J Croucher, Simon R Harris, John A Lees, Leonid Chindelevitch\",\"doi\":\"10.1093/molbev/msaf077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The study of genomic variants is increasingly important for public health surveillance of pathogens. Traditional variant-calling methods from whole-genome sequencing data rely on reference-based alignment, which can introduce biases and require significant computational resources. Alignment- and reference-free approaches offer an alternative by leveraging k-mer-based methods, but existing implementations often suffer from sensitivity limitations, particularly in high mutation density genomic regions. Here, we present ska lo, a graph-based algorithm that aims to identify within-strain variants in pathogen whole-genome sequencing data by traversing a colored De Bruijn graph and building variant groups (i.e. sets of variant combinations). Through in silico benchmarking and real-world dataset analyses, we demonstrate that ska lo achieves high sensitivity in single-nucleotide polymorphism (SNP) calls while also enabling the detection of insertions and deletions, as well as SNP positioning on a reference genome for recombination analyses. These findings highlight ska lo as a simple, fast, and effective tool for pathogen genomic epidemiology, extending the range of reference-free variant-calling approaches. ska lo is freely available as part of the SKA program (https://github.com/bacpop/ska.rust).\",\"PeriodicalId\":18730,\"journal\":{\"name\":\"Molecular biology and evolution\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":11.0000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11986325/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular biology and evolution\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/molbev/msaf077\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular biology and evolution","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/molbev/msaf077","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

基因组变异的研究对病原体的公共卫生监测越来越重要。基于全基因组测序数据的传统变异调用方法依赖于基于参考的比对，这可能会引入偏差，并且需要大量的计算资源。无比对和无参考的方法通过利用基于k-mer的方法提供了另一种选择，但现有的实现通常受到灵敏度限制，特别是在高突变密度的基因组区域。在这里，我们提出了ska lo，一种基于图的算法，旨在通过遍历彩色De Bruijn图和构建变体组（即变体组合集）来识别病原体全基因组测序数据中的菌株内变体。通过芯片基准测试和真实数据集分析，我们证明ska lo在SNP调用中实现了高灵敏度，同时还能够检测插入和缺失，以及在参考基因组上定位SNP以进行重组分析。这些发现突出了ska - lo作为一种简单、快速和有效的病原体基因组流行病学工具，扩展了无参考变异调用方法的范围。ska lo是ska项目的一部分，可以免费获得（https://github.com/bacpop/ska.rust）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Reference-Free Variant Calling with Local Graph Construction with ska lo (SKA).

The study of genomic variants is increasingly important for public health surveillance of pathogens. Traditional variant-calling methods from whole-genome sequencing data rely on reference-based alignment, which can introduce biases and require significant computational resources. Alignment- and reference-free approaches offer an alternative by leveraging k-mer-based methods, but existing implementations often suffer from sensitivity limitations, particularly in high mutation density genomic regions. Here, we present ska lo, a graph-based algorithm that aims to identify within-strain variants in pathogen whole-genome sequencing data by traversing a colored De Bruijn graph and building variant groups (i.e. sets of variant combinations). Through in silico benchmarking and real-world dataset analyses, we demonstrate that ska lo achieves high sensitivity in single-nucleotide polymorphism (SNP) calls while also enabling the detection of insertions and deletions, as well as SNP positioning on a reference genome for recombination analyses. These findings highlight ska lo as a simple, fast, and effective tool for pathogen genomic epidemiology, extending the range of reference-free variant-calling approaches. ska lo is freely available as part of the SKA program (https://github.com/bacpop/ska.rust).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Molecular biology and evolution 生物-进化生物学

CiteScore

19.70

自引率

3.70%

发文量

257

审稿时长

1 months

期刊介绍： Molecular Biology and Evolution Journal Overview: Publishes research at the interface of molecular (including genomics) and evolutionary biology Considers manuscripts containing patterns, processes, and predictions at all levels of organization: population, taxonomic, functional, and phenotypic Interested in fundamental discoveries, new and improved methods, resources, technologies, and theories advancing evolutionary research Publishes balanced reviews of recent developments in genome evolution and forward-looking perspectives suggesting future directions in molecular evolution applications.