James Sullivan, Diego De Panis, Valentina Galeone, Camila J Mazzoni
{"title":"Genome Evaluation Pipeline (GEP): a fully automated quality control tool for parallel evaluation of genome assemblies.","authors":"James Sullivan, Diego De Panis, Valentina Galeone, Camila J Mazzoni","doi":"10.1093/bioadv/vbaf147","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>The ability to generate high-quality genome assemblies is paramount for understanding global biodiversity. To streamline the quality control of the vast amount of assemblies currently being generated, we developed the Genome Evaluation Pipeline (GEP). Our Snakemake-based command-line tool is composed of two modes and was designed taking into consideration the recommendations of different international projects to standardize genome evaluation across the Tree of Life. With the Build Mode, GEP generates k-mer databases from high-accuracy sequencing reads, incorporating optional quality control and pre-processing steps. The Evaluate Mode leverages these databases to assess genome assembly quality using standard, gene content, and k-mer based metrics. Key features include the assessment of genome characteristics such as size, heterozygosity, and ploidy without reference sequences, and the comparison of k-mer profiles to evaluate assembly completeness and correctness. GEP also supports flexible input options for k-mer databases and the integration of Hi-C data for visual inspection of the assembly structure. Finally, our tool produces comprehensive reports summarizing contiguity, completeness, and correctness metrics of genome assemblies, facilitating their comparison and selection for downstream studies.</p><p><strong>Availability and implementation: </strong>GEP is publicly available as a Git repository at https://git.imp.fu-berlin.de/begendiv/gep.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf147"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12296351/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Summary: The ability to generate high-quality genome assemblies is paramount for understanding global biodiversity. To streamline the quality control of the vast amount of assemblies currently being generated, we developed the Genome Evaluation Pipeline (GEP). Our Snakemake-based command-line tool is composed of two modes and was designed taking into consideration the recommendations of different international projects to standardize genome evaluation across the Tree of Life. With the Build Mode, GEP generates k-mer databases from high-accuracy sequencing reads, incorporating optional quality control and pre-processing steps. The Evaluate Mode leverages these databases to assess genome assembly quality using standard, gene content, and k-mer based metrics. Key features include the assessment of genome characteristics such as size, heterozygosity, and ploidy without reference sequences, and the comparison of k-mer profiles to evaluate assembly completeness and correctness. GEP also supports flexible input options for k-mer databases and the integration of Hi-C data for visual inspection of the assembly structure. Finally, our tool produces comprehensive reports summarizing contiguity, completeness, and correctness metrics of genome assemblies, facilitating their comparison and selection for downstream studies.
Availability and implementation: GEP is publicly available as a Git repository at https://git.imp.fu-berlin.de/begendiv/gep.