Enhanced bovine genome annotation through integration of transcriptomics and epi-transcriptomics datasets facilitates genomic biology

IF 11.8 2区生物学 Q1 MULTIDISCIPLINARY SCIENCES

GigaScience Pub Date : 2024-04-16 DOI:10.1093/gigascience/giae019

Hamid Beiki, Brenda M Murdoch, Carissa A Park, Chandlar Kern, Denise Kontechy, Gabrielle Becker, Gonzalo Rincon, Honglin Jiang, Huaijun Zhou, Jacob Thorne, James E Koltes, Jennifer J Michal, Kimberly Davenport, Monique Rijnkels, Pablo J Ross, Rui Hu, Sarah Corum, Stephanie McKay, Timothy P L Smith, Wansheng Liu, Wenzhi Ma, Xiaohui Zhang, Xiaoqing Xu, Xuelei Han, Zhihua Jiang, Zhi-Liang Hu, James M Reecy

{"title":"Enhanced bovine genome annotation through integration of transcriptomics and epi-transcriptomics datasets facilitates genomic biology","authors":"Hamid Beiki, Brenda M Murdoch, Carissa A Park, Chandlar Kern, Denise Kontechy, Gabrielle Becker, Gonzalo Rincon, Honglin Jiang, Huaijun Zhou, Jacob Thorne, James E Koltes, Jennifer J Michal, Kimberly Davenport, Monique Rijnkels, Pablo J Ross, Rui Hu, Sarah Corum, Stephanie McKay, Timothy P L Smith, Wansheng Liu, Wenzhi Ma, Xiaohui Zhang, Xiaoqing Xu, Xuelei Han, Zhihua Jiang, Zhi-Liang Hu, James M Reecy","doi":"10.1093/gigascience/giae019","DOIUrl":null,"url":null,"abstract":"Background The accurate identification of the functional elements in the bovine genome is a fundamental requirement for high-quality analysis of data informing both genome biology and genomic selection. Functional annotation of the bovine genome was performed to identify a more complete catalog of transcript isoforms across bovine tissues. Results A total of 160,820 unique transcripts (50% protein coding) representing 34,882 unique genes (60% protein coding) were identified across tissues. Among them, 118,563 transcripts (73% of the total) were structurally validated by independent datasets (PacBio isoform sequencing data, Oxford Nanopore Technologies sequencing data, de novo assembled transcripts from RNA sequencing data) and comparison with Ensembl and NCBI gene sets. In addition, all transcripts were supported by extensive data from different technologies such as whole transcriptome termini site sequencing, RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression, chromatin immunoprecipitation sequencing, and assay for transposase-accessible chromatin using sequencing. A large proportion of identified transcripts (69%) were unannotated, of which 86% were produced by annotated genes and 14% by unannotated genes. A median of two 5′ untranslated regions were expressed per gene. Around 50% of protein-coding genes in each tissue were bifunctional and transcribed both coding and noncoding isoforms. Furthermore, we identified 3,744 genes that functioned as noncoding genes in fetal tissues but as protein-coding genes in adult tissues. Our new bovine genome annotation extended more than 11,000 annotated gene borders compared to Ensembl or NCBI annotations. The resulting bovine transcriptome was integrated with publicly available quantitative trait loci data to study tissue–tissue interconnection involved in different traits and construct the first bovine trait similarity network. Conclusions These validated results show significant improvement over current bovine genome annotations.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"19 1","pages":""},"PeriodicalIF":11.8000,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/gigascience/giae019","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Background The accurate identification of the functional elements in the bovine genome is a fundamental requirement for high-quality analysis of data informing both genome biology and genomic selection. Functional annotation of the bovine genome was performed to identify a more complete catalog of transcript isoforms across bovine tissues. Results A total of 160,820 unique transcripts (50% protein coding) representing 34,882 unique genes (60% protein coding) were identified across tissues. Among them, 118,563 transcripts (73% of the total) were structurally validated by independent datasets (PacBio isoform sequencing data, Oxford Nanopore Technologies sequencing data, de novo assembled transcripts from RNA sequencing data) and comparison with Ensembl and NCBI gene sets. In addition, all transcripts were supported by extensive data from different technologies such as whole transcriptome termini site sequencing, RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression, chromatin immunoprecipitation sequencing, and assay for transposase-accessible chromatin using sequencing. A large proportion of identified transcripts (69%) were unannotated, of which 86% were produced by annotated genes and 14% by unannotated genes. A median of two 5′ untranslated regions were expressed per gene. Around 50% of protein-coding genes in each tissue were bifunctional and transcribed both coding and noncoding isoforms. Furthermore, we identified 3,744 genes that functioned as noncoding genes in fetal tissues but as protein-coding genes in adult tissues. Our new bovine genome annotation extended more than 11,000 annotated gene borders compared to Ensembl or NCBI annotations. The resulting bovine transcriptome was integrated with publicly available quantitative trait loci data to study tissue–tissue interconnection involved in different traits and construct the first bovine trait similarity network. Conclusions These validated results show significant improvement over current bovine genome annotations.

查看原文本刊更多论文

通过整合转录组学和表观转录组学数据集加强牛基因组注释，促进基因组生物学发展

背景准确识别牛基因组中的功能元件是高质量分析数据、为基因组生物学和基因组选择提供信息的基本要求。我们对牛基因组进行了功能注释，以确定牛组织中更完整的转录本异构体目录。结果在各组织中共鉴定出 160,820 个独特的转录本（50% 蛋白编码），代表 34,882 个独特的基因（60% 蛋白编码）。其中，118,563 个转录本（占总数的 73%）通过独立数据集（PacBio 异构体测序数据、牛津纳米孔技术测序数据、从 RNA 测序数据中重新组装的转录本）以及与 Ensembl 和 NCBI 基因集的比较进行了结构验证。此外，所有转录本都有来自不同技术的大量数据支持，如全转录本组末端位点测序、用于基因表达分析的 RNA 注释和启动子图谱、染色质免疫沉淀测序，以及使用测序法检测转座酶可进入染色质。鉴定出的转录本中有很大一部分（69%）是未注释的，其中 86% 由已注释基因产生，14% 由未注释基因产生。每个基因表达的 5′非翻译区中位数为两个。每个组织中约有 50%的蛋白编码基因具有双重功能，同时转录编码和非编码同工酶。此外，我们还发现 3744 个基因在胎儿组织中作为非编码基因，但在成年组织中作为蛋白编码基因。与 Ensembl 或 NCBI 的注释相比，我们的新牛基因组注释扩展了 11,000 多个注释基因边界。我们将得到的牛转录组与公开的定量性状位点数据整合在一起，以研究不同性状所涉及的组织-组织之间的相互联系，并构建了第一个牛性状相似性网络。结论这些验证结果表明，与目前的牛基因组注释相比，牛基因组注释有了显著改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

GigaScience MULTIDISCIPLINARY SCIENCES-

CiteScore

15.50

自引率

1.10%

发文量

119

审稿时长

1 weeks

期刊介绍： GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.