Identification of universal grass genes and estimates of their monocot-/commelinid-/grass-specificity.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Bioinformatics advances Pub Date : 2025-04-07 eCollection Date: 2025-01-01 DOI:10.1093/bioadv/vbaf079
Rowan A C Mitchell
{"title":"Identification of universal grass genes and estimates of their monocot-/commelinid-/grass-specificity.","authors":"Rowan A C Mitchell","doi":"10.1093/bioadv/vbaf079","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Where experiments identify sets of grass genes of unknown function, e.g. underlying a QTL or co-expressed in a transcriptome, it is useful to know which of these genes are common to all grasses (universal) and whether they likely have monocot-/commelinid-/grass-specific function.</p><p><strong>Results: </strong>A pipeline used data on 16 grass full genomes from Ensembl Plants to generate 13 312 highly conserved, universal groups of grass protein-coding genes. Validation steps showed that 98.8% of these groups also had gene matches in recently sequenced genomes from two major grass clades not used in the pipeline. Comparison with many non-grass genomes identified 4609 of these groups as likely of monocot-/commelinid-/grass-specific function. Both grouping of genes and specificity were defined using hidden Markov model (HMM) profiles of the groups. The HMM-based approach performed better than simple percentage identity in discriminating between test sets of known specific and non-specific genes. The results give novel insight into the nature of monocot-/commelinid-/grass-specific genes. Researchers can use the universal_grass_peps database to gain evidence for their experimentally identified grass genes being involved in monocot-/commelinid-/grass-specific traits.</p><p><strong>Availability and implementation: </strong>The universal_grass_peps database is available for download at https://data.rothamsted.ac.uk/dataset/universal_grass_peps.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf079"},"PeriodicalIF":2.4000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12098945/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Where experiments identify sets of grass genes of unknown function, e.g. underlying a QTL or co-expressed in a transcriptome, it is useful to know which of these genes are common to all grasses (universal) and whether they likely have monocot-/commelinid-/grass-specific function.

Results: A pipeline used data on 16 grass full genomes from Ensembl Plants to generate 13 312 highly conserved, universal groups of grass protein-coding genes. Validation steps showed that 98.8% of these groups also had gene matches in recently sequenced genomes from two major grass clades not used in the pipeline. Comparison with many non-grass genomes identified 4609 of these groups as likely of monocot-/commelinid-/grass-specific function. Both grouping of genes and specificity were defined using hidden Markov model (HMM) profiles of the groups. The HMM-based approach performed better than simple percentage identity in discriminating between test sets of known specific and non-specific genes. The results give novel insight into the nature of monocot-/commelinid-/grass-specific genes. Researchers can use the universal_grass_peps database to gain evidence for their experimentally identified grass genes being involved in monocot-/commelinid-/grass-specific traits.

Availability and implementation: The universal_grass_peps database is available for download at https://data.rothamsted.ac.uk/dataset/universal_grass_peps.

禾草通用基因的鉴定及其单子叶/ commellid /草特异性的估计。
动机:当实验确定了功能未知的草基因组,例如潜在的QTL或在转录组中共表达,了解这些基因中哪些是所有草共有的(普遍的),以及它们是否可能具有单子叶/ commellid /草特异性的功能是有用的。结果:利用来自Ensembl Plants的16个草全基因组数据,一个管道生成了13312个高度保守的、通用的草蛋白编码基因群。验证步骤表明,98.8%的这些群体在最近测序的两个主要草枝的基因组中也有基因匹配,这些基因组没有用于管道。与许多非草类基因组比较,鉴定出其中4609个群体可能具有单子叶/ commellid /草特异性功能。使用隐马尔可夫模型(HMM)定义各组基因的分组和特异性。基于hmm的方法在区分已知特异性和非特异性基因的测试集方面优于简单的百分比识别。这些结果对单子叶/commelinid /grass特异性基因的性质提供了新的见解。研究人员可以使用universal_grass_pep数据库来获得他们通过实验确定的草基因参与单子草/ commellid草/草特异性性状的证据。可用性和实现:universal_grass_pep数据库可从https://data.rothamsted.ac.uk/dataset/universal_grass_peps下载。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.60
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信