Genome Evaluation Pipeline (GEP): a fully automated quality control tool for parallel evaluation of genome assemblies.

IF 2.8 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-06-26 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf147
James Sullivan, Diego De Panis, Valentina Galeone, Camila J Mazzoni
{"title":"Genome Evaluation Pipeline (GEP): a fully automated quality control tool for parallel evaluation of genome assemblies.","authors":"James Sullivan, Diego De Panis, Valentina Galeone, Camila J Mazzoni","doi":"10.1093/bioadv/vbaf147","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>The ability to generate high-quality genome assemblies is paramount for understanding global biodiversity. To streamline the quality control of the vast amount of assemblies currently being generated, we developed the Genome Evaluation Pipeline (GEP). Our Snakemake-based command-line tool is composed of two modes and was designed taking into consideration the recommendations of different international projects to standardize genome evaluation across the Tree of Life. With the Build Mode, GEP generates k-mer databases from high-accuracy sequencing reads, incorporating optional quality control and pre-processing steps. The Evaluate Mode leverages these databases to assess genome assembly quality using standard, gene content, and k-mer based metrics. Key features include the assessment of genome characteristics such as size, heterozygosity, and ploidy without reference sequences, and the comparison of k-mer profiles to evaluate assembly completeness and correctness. GEP also supports flexible input options for k-mer databases and the integration of Hi-C data for visual inspection of the assembly structure. Finally, our tool produces comprehensive reports summarizing contiguity, completeness, and correctness metrics of genome assemblies, facilitating their comparison and selection for downstream studies.</p><p><strong>Availability and implementation: </strong>GEP is publicly available as a Git repository at https://git.imp.fu-berlin.de/begendiv/gep.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf147"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12296351/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Summary: The ability to generate high-quality genome assemblies is paramount for understanding global biodiversity. To streamline the quality control of the vast amount of assemblies currently being generated, we developed the Genome Evaluation Pipeline (GEP). Our Snakemake-based command-line tool is composed of two modes and was designed taking into consideration the recommendations of different international projects to standardize genome evaluation across the Tree of Life. With the Build Mode, GEP generates k-mer databases from high-accuracy sequencing reads, incorporating optional quality control and pre-processing steps. The Evaluate Mode leverages these databases to assess genome assembly quality using standard, gene content, and k-mer based metrics. Key features include the assessment of genome characteristics such as size, heterozygosity, and ploidy without reference sequences, and the comparison of k-mer profiles to evaluate assembly completeness and correctness. GEP also supports flexible input options for k-mer databases and the integration of Hi-C data for visual inspection of the assembly structure. Finally, our tool produces comprehensive reports summarizing contiguity, completeness, and correctness metrics of genome assemblies, facilitating their comparison and selection for downstream studies.

Availability and implementation: GEP is publicly available as a Git repository at https://git.imp.fu-berlin.de/begendiv/gep.

基因组评估管道(GEP):一个全自动化的质量控制工具,用于平行评估基因组组装。
摘要:产生高质量基因组组装的能力对于理解全球生物多样性至关重要。为了简化目前正在生成的大量组装的质量控制,我们开发了基因组评估管道(GEP)。我们基于snakemaker的命令行工具由两种模式组成,其设计考虑了不同国际项目的建议,以标准化整个生命之树的基因组评估。通过构建模式,GEP从高精度测序读取生成k-mer数据库,结合可选的质量控制和预处理步骤。评估模式利用这些数据库来评估基因组组装质量使用标准,基因含量,和k-mer为基础的指标。主要特征包括基因组特征的评估,如大小、杂合性和倍性,没有参考序列,以及k-mer谱的比较,以评估组装的完整性和正确性。GEP还支持k-mer数据库的灵活输入选项和Hi-C数据的集成,用于对装配结构进行目视检查。最后,我们的工具生成全面的报告,总结了基因组组装的连续性、完整性和正确性指标,促进了下游研究的比较和选择。可用性和实现:GEP作为Git存储库在https://git.imp.fu-berlin.de/begendiv/gep上公开可用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信