{"title":"Identification of universal grass genes and estimates of their monocot-/commelinid-/grass-specificity.","authors":"Rowan A C Mitchell","doi":"10.1093/bioadv/vbaf079","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Where experiments identify sets of grass genes of unknown function, e.g. underlying a QTL or co-expressed in a transcriptome, it is useful to know which of these genes are common to all grasses (universal) and whether they likely have monocot-/commelinid-/grass-specific function.</p><p><strong>Results: </strong>A pipeline used data on 16 grass full genomes from Ensembl Plants to generate 13 312 highly conserved, universal groups of grass protein-coding genes. Validation steps showed that 98.8% of these groups also had gene matches in recently sequenced genomes from two major grass clades not used in the pipeline. Comparison with many non-grass genomes identified 4609 of these groups as likely of monocot-/commelinid-/grass-specific function. Both grouping of genes and specificity were defined using hidden Markov model (HMM) profiles of the groups. The HMM-based approach performed better than simple percentage identity in discriminating between test sets of known specific and non-specific genes. The results give novel insight into the nature of monocot-/commelinid-/grass-specific genes. Researchers can use the universal_grass_peps database to gain evidence for their experimentally identified grass genes being involved in monocot-/commelinid-/grass-specific traits.</p><p><strong>Availability and implementation: </strong>The universal_grass_peps database is available for download at https://data.rothamsted.ac.uk/dataset/universal_grass_peps.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf079"},"PeriodicalIF":2.4000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12098945/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics advances","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioadv/vbaf079","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Motivation: Where experiments identify sets of grass genes of unknown function, e.g. underlying a QTL or co-expressed in a transcriptome, it is useful to know which of these genes are common to all grasses (universal) and whether they likely have monocot-/commelinid-/grass-specific function.
Results: A pipeline used data on 16 grass full genomes from Ensembl Plants to generate 13 312 highly conserved, universal groups of grass protein-coding genes. Validation steps showed that 98.8% of these groups also had gene matches in recently sequenced genomes from two major grass clades not used in the pipeline. Comparison with many non-grass genomes identified 4609 of these groups as likely of monocot-/commelinid-/grass-specific function. Both grouping of genes and specificity were defined using hidden Markov model (HMM) profiles of the groups. The HMM-based approach performed better than simple percentage identity in discriminating between test sets of known specific and non-specific genes. The results give novel insight into the nature of monocot-/commelinid-/grass-specific genes. Researchers can use the universal_grass_peps database to gain evidence for their experimentally identified grass genes being involved in monocot-/commelinid-/grass-specific traits.
Availability and implementation: The universal_grass_peps database is available for download at https://data.rothamsted.ac.uk/dataset/universal_grass_peps.