GTDrift:用于探索真核生物中遗传漂变、基因组和转录组特征之间相互作用的资源。

IF 4 Q1 GENETICS & HEREDITY
NAR Genomics and Bioinformatics Pub Date : 2024-06-12 eCollection Date: 2024-06-01 DOI:10.1093/nargab/lqae064
Florian Bénitière, Laurent Duret, Anamaria Necsulea
{"title":"GTDrift:用于探索真核生物中遗传漂变、基因组和转录组特征之间相互作用的资源。","authors":"Florian Bénitière, Laurent Duret, Anamaria Necsulea","doi":"10.1093/nargab/lqae064","DOIUrl":null,"url":null,"abstract":"<p><p>We present GTDrift, a comprehensive data resource that enables explorations of genomic and transcriptomic characteristics alongside proxies of the intensity of genetic drift in individual species. This resource encompasses data for 1506 eukaryotic species, including 1413 animals and 93 green plants, and is organized in three components. The first two components contain approximations of the effective population size, which serve as indicators of the extent of random genetic drift within each species. In the first component, we meticulously investigated public databases to assemble data on life history traits such as longevity, adult body length and body mass for a set of 979 species. The second component includes estimations of the ratio between the rate of non-synonymous substitutions and the rate of synonymous substitutions (d<i>N</i>/d<i>S</i>) in protein-coding sequences for 1324 species. This ratio provides an estimate of the efficiency of natural selection in purging deleterious substitutions. Additionally, we present polymorphism-derived <i>N</i> <sub>e</sub> estimates for 66 species. The third component encompasses various genomic and transcriptomic characteristics. With this component, we aim to facilitate comparative transcriptomics analyses across species, by providing easy-to-use processed data for more than 16 000 RNA-seq samples across 491 species. These data include intron-centered alternative splicing frequencies, gene expression levels and sequencing depth statistics for each species, obtained with a homogeneous analysis protocol. To enable cross-species comparisons, we provide orthology predictions for conserved single-copy genes based on BUSCO gene sets. To illustrate the possible uses of this database, we identify the most frequently used introns for each gene and we assess how the sequencing depth available for each species affects our power to identify major and minor splice variants.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 2","pages":"lqae064"},"PeriodicalIF":4.0000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11167491/pdf/","citationCount":"0","resultStr":"{\"title\":\"GTDrift: a resource for exploring the interplay between genetic drift, genomic and transcriptomic characteristics in eukaryotes.\",\"authors\":\"Florian Bénitière, Laurent Duret, Anamaria Necsulea\",\"doi\":\"10.1093/nargab/lqae064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We present GTDrift, a comprehensive data resource that enables explorations of genomic and transcriptomic characteristics alongside proxies of the intensity of genetic drift in individual species. This resource encompasses data for 1506 eukaryotic species, including 1413 animals and 93 green plants, and is organized in three components. The first two components contain approximations of the effective population size, which serve as indicators of the extent of random genetic drift within each species. In the first component, we meticulously investigated public databases to assemble data on life history traits such as longevity, adult body length and body mass for a set of 979 species. The second component includes estimations of the ratio between the rate of non-synonymous substitutions and the rate of synonymous substitutions (d<i>N</i>/d<i>S</i>) in protein-coding sequences for 1324 species. This ratio provides an estimate of the efficiency of natural selection in purging deleterious substitutions. Additionally, we present polymorphism-derived <i>N</i> <sub>e</sub> estimates for 66 species. The third component encompasses various genomic and transcriptomic characteristics. With this component, we aim to facilitate comparative transcriptomics analyses across species, by providing easy-to-use processed data for more than 16 000 RNA-seq samples across 491 species. These data include intron-centered alternative splicing frequencies, gene expression levels and sequencing depth statistics for each species, obtained with a homogeneous analysis protocol. To enable cross-species comparisons, we provide orthology predictions for conserved single-copy genes based on BUSCO gene sets. To illustrate the possible uses of this database, we identify the most frequently used introns for each gene and we assess how the sequencing depth available for each species affects our power to identify major and minor splice variants.</p>\",\"PeriodicalId\":33994,\"journal\":{\"name\":\"NAR Genomics and Bioinformatics\",\"volume\":\"6 2\",\"pages\":\"lqae064\"},\"PeriodicalIF\":4.0000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11167491/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"NAR Genomics and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/nargab/lqae064\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"NAR Genomics and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/nargab/lqae064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

我们介绍的 GTDrift 是一个综合数据资源,可用于探索基因组和转录组特征,以及单个物种遗传漂变强度的代用指标。该资源包含 1506 个真核生物物种的数据,其中包括 1413 种动物和 93 种绿色植物,由三个部分组成。前两个部分包含有效种群规模的近似值,作为每个物种内部随机遗传漂变程度的指标。在第一部分中,我们对公共数据库进行了细致的调查,收集了 979 个物种的生命史特征数据,如寿命、成年体长和体重。第二部分包括对 1324 个物种的蛋白质编码序列中的非同义替换率和同义替换率之间的比率(dN/dS)的估计。该比率可估算出自然选择清除有害替换的效率。此外,我们还提供了 66 个物种的多态性衍生 N e 估计值。第三部分包括各种基因组和转录组特征。通过该部分,我们提供了 491 个物种 16000 多个 RNA-seq 样本的易用处理数据,旨在促进跨物种的转录组学比较分析。这些数据包括以内含子为中心的替代剪接频率、基因表达水平和每个物种的测序深度统计数据,这些数据都是通过同质分析协议获得的。为了进行跨物种比较,我们提供了基于 BUSCO 基因组的保守单拷贝基因的同源预测。为了说明该数据库的可能用途,我们确定了每个基因最常用的内含子,并评估了每个物种的测序深度如何影响我们确定主要和次要剪接变异的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GTDrift: a resource for exploring the interplay between genetic drift, genomic and transcriptomic characteristics in eukaryotes.

We present GTDrift, a comprehensive data resource that enables explorations of genomic and transcriptomic characteristics alongside proxies of the intensity of genetic drift in individual species. This resource encompasses data for 1506 eukaryotic species, including 1413 animals and 93 green plants, and is organized in three components. The first two components contain approximations of the effective population size, which serve as indicators of the extent of random genetic drift within each species. In the first component, we meticulously investigated public databases to assemble data on life history traits such as longevity, adult body length and body mass for a set of 979 species. The second component includes estimations of the ratio between the rate of non-synonymous substitutions and the rate of synonymous substitutions (dN/dS) in protein-coding sequences for 1324 species. This ratio provides an estimate of the efficiency of natural selection in purging deleterious substitutions. Additionally, we present polymorphism-derived N e estimates for 66 species. The third component encompasses various genomic and transcriptomic characteristics. With this component, we aim to facilitate comparative transcriptomics analyses across species, by providing easy-to-use processed data for more than 16 000 RNA-seq samples across 491 species. These data include intron-centered alternative splicing frequencies, gene expression levels and sequencing depth statistics for each species, obtained with a homogeneous analysis protocol. To enable cross-species comparisons, we provide orthology predictions for conserved single-copy genes based on BUSCO gene sets. To illustrate the possible uses of this database, we identify the most frequently used introns for each gene and we assess how the sequencing depth available for each species affects our power to identify major and minor splice variants.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
8.00
自引率
2.20%
发文量
95
审稿时长
15 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信