{"title":"Navigating Eukaryotic Genome Annotation Pipelines: A Route Map to BRAKER, Galba, and TSEBRA","authors":"Tomáš Brůna, Lars Gabriel, Katharina J. Hoff","doi":"arxiv-2403.19416","DOIUrl":null,"url":null,"abstract":"Annotating the structure of protein-coding genes represents a major challenge\nin the analysis of eukaryotic genomes. This task sets the groundwork for\nsubsequent genomic studies aimed at understanding the functions of individual\ngenes. BRAKER and Galba are two fully automated and containerized pipelines\ndesigned to perform accurate genome annotation. BRAKER integrates the\nGeneMark-ETP and AUGUSTUS gene finders, employing the TSEBRA combiner to attain\nhigh sensitivity and precision. BRAKER is adept at handling genomes of any\nsize, provided that it has access to both transcript expression sequencing data\nand an extensive protein database from the target clade. In particular, BRAKER\ndemonstrates high accuracy even with only one type of these extrinsic evidence\nsources, although it should be noted that accuracy diminishes for larger\ngenomes under such conditions. In contrast, Galba adopts a distinct methodology\nutilizing the outcomes of direct protein-to-genome spliced alignments using\nminiprot to generate training genes and evidence for gene prediction in\nAUGUSTUS. Galba has superior accuracy in large genomes if protein sequences are\nthe only source of evidence. This chapter provides practical guidelines for\nemploying both pipelines in the annotation of eukaryotic genomes, with a focus\non insect genomes.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2403.19416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Annotating the structure of protein-coding genes represents a major challenge
in the analysis of eukaryotic genomes. This task sets the groundwork for
subsequent genomic studies aimed at understanding the functions of individual
genes. BRAKER and Galba are two fully automated and containerized pipelines
designed to perform accurate genome annotation. BRAKER integrates the
GeneMark-ETP and AUGUSTUS gene finders, employing the TSEBRA combiner to attain
high sensitivity and precision. BRAKER is adept at handling genomes of any
size, provided that it has access to both transcript expression sequencing data
and an extensive protein database from the target clade. In particular, BRAKER
demonstrates high accuracy even with only one type of these extrinsic evidence
sources, although it should be noted that accuracy diminishes for larger
genomes under such conditions. In contrast, Galba adopts a distinct methodology
utilizing the outcomes of direct protein-to-genome spliced alignments using
miniprot to generate training genes and evidence for gene prediction in
AUGUSTUS. Galba has superior accuracy in large genomes if protein sequences are
the only source of evidence. This chapter provides practical guidelines for
employing both pipelines in the annotation of eukaryotic genomes, with a focus
on insect genomes.