The evaluation of different combinations of enzyme set, aligner and caller in GBS sequencing of soybean.

IF 4.4 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Plant Methods Pub Date : 2025-08-06 DOI:10.1186/s13007-025-01410-8

Aleksei Zamalutdinov, Stepan Boldyrev, Cécile Ben, Laurent Gentzbittel

{"title":"The evaluation of different combinations of enzyme set, aligner and caller in GBS sequencing of soybean.","authors":"Aleksei Zamalutdinov, Stepan Boldyrev, Cécile Ben, Laurent Gentzbittel","doi":"10.1186/s13007-025-01410-8","DOIUrl":null,"url":null,"abstract":"Background: Genotype-by-sequencing (GBS) is a cost-effective method for large-scale genotyping, widely used across various species, particularly those with large genomes. A critical aspect of GBS lies in the selection of restriction enzymes for genome digestion and the optimization of data analysis pipelines. However, few studies have comprehensively examined the combined effects of enzyme choice and pipeline configuration.Results: In this study, we created GBS libraries using three commonly used restriction enzyme combinations (HindIII-NlaIII, PstI-MspI, and ApeKI) and assessed multiple SNP-calling pipelines in 15 soybean varieties. We tested four aligners (BWA-MEM, Bowtie2, BBMap, and Strobealign) and seven SNP callers (Bcftools, Stacks, DeepVariant, FreeBayes, VarScan, BBCallVariants, and GATK). Our finding reveal that enzyme choice significantly influences the number of identified SNP, gene localization preferences, and accuracy. Furthermore, the performance of SNP callers varied markedly in terms of SNP count, precision, recall, and false discovery rate (FDR). DeepVariant exhibited the highest accuracy, with 76.0% of its SNPs intersecting with whole-genome sequencing (WGS)-derived SNPs and an FDR of 0.0095, compared to FreeBayes, which had 47.8% intersection and an FDR of 0.6321.Conclusions: Our findings underscore the importance of optimizing both enzyme selection for sequencing libraries and data analysis pipelines to ensure robust and reproducible results. This study provides a general framework for designing large-scale genotyping experiments aimed to specific quality and quantity requirements in various plant species.","PeriodicalId":20100,"journal":{"name":"Plant Methods","volume":"21 1","pages":"106"},"PeriodicalIF":4.4000,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12330036/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13007-025-01410-8","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Genotype-by-sequencing (GBS) is a cost-effective method for large-scale genotyping, widely used across various species, particularly those with large genomes. A critical aspect of GBS lies in the selection of restriction enzymes for genome digestion and the optimization of data analysis pipelines. However, few studies have comprehensively examined the combined effects of enzyme choice and pipeline configuration.

Results: In this study, we created GBS libraries using three commonly used restriction enzyme combinations (HindIII-NlaIII, PstI-MspI, and ApeKI) and assessed multiple SNP-calling pipelines in 15 soybean varieties. We tested four aligners (BWA-MEM, Bowtie2, BBMap, and Strobealign) and seven SNP callers (Bcftools, Stacks, DeepVariant, FreeBayes, VarScan, BBCallVariants, and GATK). Our finding reveal that enzyme choice significantly influences the number of identified SNP, gene localization preferences, and accuracy. Furthermore, the performance of SNP callers varied markedly in terms of SNP count, precision, recall, and false discovery rate (FDR). DeepVariant exhibited the highest accuracy, with 76.0% of its SNPs intersecting with whole-genome sequencing (WGS)-derived SNPs and an FDR of 0.0095, compared to FreeBayes, which had 47.8% intersection and an FDR of 0.6321.

Conclusions: Our findings underscore the importance of optimizing both enzyme selection for sequencing libraries and data analysis pipelines to ensure robust and reproducible results. This study provides a general framework for designing large-scale genotyping experiments aimed to specific quality and quantity requirements in various plant species.

查看原文本刊更多论文

大豆GBS测序中不同酶组、比对者和调用者组合的评价。

背景：基因型测序（GBS）是一种经济有效的大规模基因分型方法，广泛应用于各种物种，特别是那些具有大基因组的物种。GBS的一个关键方面在于基因组消化限制性内切酶的选择和数据分析管道的优化。然而，很少有研究全面考察了酶的选择和管道结构的综合影响。结果：利用三种常用的限制性内切酶组合（HindIII-NlaIII、PstI-MspI和ApeKI）建立了GBS文库，并对15个大豆品种的多个snp调用管道进行了评估。我们测试了四个对齐器（BWA-MEM、Bowtie2、BBMap和Strobealign）和七个SNP调用器（Bcftools、Stacks、DeepVariant、FreeBayes、VarScan、bbcallvariant和GATK）。我们的研究结果表明，酶的选择显著影响鉴定SNP的数量、基因定位偏好和准确性。此外，SNP呼叫者的表现在SNP计数、精度、召回率和错误发现率（FDR）方面存在显著差异。DeepVariant显示出最高的准确性，其76.0%的snp与全基因组测序（WGS）衍生的snp相交，FDR为0.0095，而FreeBayes的相交率为47.8%，FDR为0.6321。结论：我们的研究结果强调了优化酶选择对测序文库和数据分析管道的重要性，以确保稳健和可重复的结果。该研究为设计针对不同植物物种的特定质量和数量要求的大规模基因分型实验提供了一个总体框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Plant Methods 生物-植物科学

CiteScore

9.20

自引率

3.90%

发文量

121

审稿时长

2 months

期刊介绍： Plant Methods is an open access, peer-reviewed, online journal for the plant research community that encompasses all aspects of technological innovation in the plant sciences. There is no doubt that we have entered an exciting new era in plant biology. The completion of the Arabidopsis genome sequence, and the rapid progress being made in other plant genomics projects are providing unparalleled opportunities for progress in all areas of plant science. Nevertheless, enormous challenges lie ahead if we are to understand the function of every gene in the genome, and how the individual parts work together to make the whole organism. Achieving these goals will require an unprecedented collaborative effort, combining high-throughput, system-wide technologies with more focused approaches that integrate traditional disciplines such as cell biology, biochemistry and molecular genetics. Technological innovation is probably the most important catalyst for progress in any scientific discipline. Plant Methods’ goal is to stimulate the development and adoption of new and improved techniques and research tools and, where appropriate, to promote consistency of methodologies for better integration of data from different laboratories.