GeCKO: user-friendly workflows for genotyping complex genomes using target enrichment capture. A use case on the large tetraploid durum wheat genome.

IF 4.7 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Morgane Ardisson, Johanna Girodolle, Stéphane De Mita, Pierre Roumet, Vincent Ranwez
{"title":"GeCKO: user-friendly workflows for genotyping complex genomes using target enrichment capture. A use case on the large tetraploid durum wheat genome.","authors":"Morgane Ardisson, Johanna Girodolle, Stéphane De Mita, Pierre Roumet, Vincent Ranwez","doi":"10.1186/s13007-024-01210-6","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Genotyping of individuals plays a pivotal role in various biological analyses, with technology choice influenced by multiple factors including genomic constraints, number of targeted loci and individuals, cost considerations, and the ease of sample preparation and data processing. Target enrichment capture of specific polymorphic regions has emerged as a flexible and cost-effective genomic reduction method for genotyping, especially adapted to the case of very large genomes. However, this approach necessitates complex bioinformatics treatment to extract genotyping data from raw reads. Existing workflows predominantly cater to phylogenetic inference, leaving a gap in user-friendly tools for genotyping analysis based on capture methods. In response to these challenges, we have developed GeCKO (Genotyping Complexity Knocked-Out). To assess the effectiveness of combining target enrichment capture with GeCKO, we conducted a case study on durum wheat domestication history, involving sequencing, processing, and analyzing variants in four relevant durum wheat groups.</p><p><strong>Results: </strong>GeCKO encompasses four distinct workflows, each designed for specific steps of genomic data processing: (i) read demultiplexing and trimming for data cleaning, (ii) read mapping to align sequences to a reference genome, (iii) variant calling to identify genetic variants, and (iv) variant filtering. Each workflow in GeCKO can be easily configured and is executable across diverse computational environments. The workflows generate comprehensive HTML reports including key summary statistics and illustrative graphs, ensuring traceable, reproducible results and facilitating straightforward quality assessment. A specific innovation within GeCKO is its 'targeted remapping' feature, specifically designed for efficient treatment of targeted enrichment capture data. This process consists of extracting reads mapped to the targeted regions, constructing a smaller sub-reference genome, and remapping the reads to this sub-reference, thereby enhancing the efficiency of subsequent steps.</p><p><strong>Conclusions: </strong>The case study results showed the expected intra-group diversity and inter-group differentiation levels, confirming the method's effectiveness for genotyping and analyzing genetic diversity in species with complex genomes. GeCKO streamlined the data processing, significantly improving computational performance and efficiency. The targeted remapping enabled straightforward SNP calling in durum wheat, a task otherwise complicated by the species' large genome size. This illustrates its potential applications in various biological research contexts.</p>","PeriodicalId":20100,"journal":{"name":"Plant Methods","volume":"20 1","pages":"103"},"PeriodicalIF":4.7000,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11246579/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13007-024-01210-6","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Genotyping of individuals plays a pivotal role in various biological analyses, with technology choice influenced by multiple factors including genomic constraints, number of targeted loci and individuals, cost considerations, and the ease of sample preparation and data processing. Target enrichment capture of specific polymorphic regions has emerged as a flexible and cost-effective genomic reduction method for genotyping, especially adapted to the case of very large genomes. However, this approach necessitates complex bioinformatics treatment to extract genotyping data from raw reads. Existing workflows predominantly cater to phylogenetic inference, leaving a gap in user-friendly tools for genotyping analysis based on capture methods. In response to these challenges, we have developed GeCKO (Genotyping Complexity Knocked-Out). To assess the effectiveness of combining target enrichment capture with GeCKO, we conducted a case study on durum wheat domestication history, involving sequencing, processing, and analyzing variants in four relevant durum wheat groups.

Results: GeCKO encompasses four distinct workflows, each designed for specific steps of genomic data processing: (i) read demultiplexing and trimming for data cleaning, (ii) read mapping to align sequences to a reference genome, (iii) variant calling to identify genetic variants, and (iv) variant filtering. Each workflow in GeCKO can be easily configured and is executable across diverse computational environments. The workflows generate comprehensive HTML reports including key summary statistics and illustrative graphs, ensuring traceable, reproducible results and facilitating straightforward quality assessment. A specific innovation within GeCKO is its 'targeted remapping' feature, specifically designed for efficient treatment of targeted enrichment capture data. This process consists of extracting reads mapped to the targeted regions, constructing a smaller sub-reference genome, and remapping the reads to this sub-reference, thereby enhancing the efficiency of subsequent steps.

Conclusions: The case study results showed the expected intra-group diversity and inter-group differentiation levels, confirming the method's effectiveness for genotyping and analyzing genetic diversity in species with complex genomes. GeCKO streamlined the data processing, significantly improving computational performance and efficiency. The targeted remapping enabled straightforward SNP calling in durum wheat, a task otherwise complicated by the species' large genome size. This illustrates its potential applications in various biological research contexts.

GeCKO:使用目标富集捕获对复杂基因组进行基因分型的用户友好型工作流。大型四倍体硬粒小麦基因组使用案例。
背景:个体基因分型在各种生物分析中起着关键作用,技术选择受多种因素影响,包括基因组限制、目标位点和个体数量、成本考虑以及样本制备和数据处理的难易程度。对特定多态区进行靶标富集捕获已成为一种灵活、经济高效的基因组还原方法,尤其适用于超大基因组的基因分型。然而,这种方法需要复杂的生物信息学处理,才能从原始读数中提取基因分型数据。现有的工作流程主要针对系统发育推断,而基于捕获方法进行基因分型分析的用户友好型工具尚属空白。为了应对这些挑战,我们开发了 GeCKO(Genotyping Complexity Knocked-Out)。为了评估目标富集捕获与 GeCKO 结合的有效性,我们进行了一项关于硬质小麦驯化历史的案例研究,包括对四个相关硬质小麦群体的变异进行测序、处理和分析:GeCKO 包括四个不同的工作流程,每个流程都是为基因组数据处理的特定步骤而设计的:(i) 读取解复用和修剪以进行数据清洗,(ii) 读取映射以将序列与参考基因组进行比对,(iii) 变异调用以识别遗传变异,(iv) 变异过滤。GeCKO 中的每个工作流程都可以轻松配置,并可在不同的计算环境中执行。工作流程可生成全面的 HTML 报告,包括关键的摘要统计和说明性图表,确保结果可追溯、可重现,并便于直接进行质量评估。GeCKO 的一项具体创新是其 "靶向重映射 "功能,该功能专为高效处理靶向富集捕获数据而设计。这一过程包括提取映射到目标区域的读数,构建一个较小的子参考基因组,并将读数重新映射到该子参考基因组,从而提高后续步骤的效率:案例研究结果显示了预期的组内多样性和组间分化水平,证实了该方法在复杂基因组物种的基因分型和遗传多样性分析中的有效性。GeCKO 简化了数据处理过程,大大提高了计算性能和效率。有针对性的重映射使硬质小麦的 SNP 调用变得简单易行,而该物种庞大的基因组使这项工作变得复杂。这说明了 GeCKO 在各种生物研究中的潜在应用价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Plant Methods
Plant Methods 生物-植物科学
CiteScore
9.20
自引率
3.90%
发文量
121
审稿时长
2 months
期刊介绍: Plant Methods is an open access, peer-reviewed, online journal for the plant research community that encompasses all aspects of technological innovation in the plant sciences. There is no doubt that we have entered an exciting new era in plant biology. The completion of the Arabidopsis genome sequence, and the rapid progress being made in other plant genomics projects are providing unparalleled opportunities for progress in all areas of plant science. Nevertheless, enormous challenges lie ahead if we are to understand the function of every gene in the genome, and how the individual parts work together to make the whole organism. Achieving these goals will require an unprecedented collaborative effort, combining high-throughput, system-wide technologies with more focused approaches that integrate traditional disciplines such as cell biology, biochemistry and molecular genetics. Technological innovation is probably the most important catalyst for progress in any scientific discipline. Plant Methods’ goal is to stimulate the development and adoption of new and improved techniques and research tools and, where appropriate, to promote consistency of methodologies for better integration of data from different laboratories.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信