{"title":"RNAirport: a deep neural network-based database characterizing representative gene models in plants.","authors":"Sitao Zhu, Shu Yuan, Ruixia Niu, Yulu Zhou, Zhao Wang, Guoyong Xu","doi":"10.1016/j.jgg.2024.03.004","DOIUrl":null,"url":null,"abstract":"<p><p>A 5'-leader, known initially as the 5'-untranslated region, contains multiple isoforms due to alternative splicing (aS) and alternative transcription start site (aTSS). Therefore, a representative 5'-leader is demanded to examine the embedded RNA regulatory elements in controlling translation efficiency. Here, we develop a ranking algorithm and a deep-learning model to annotate representative 5'-leaders for five plant species. We rank the intra-sample and inter-sample frequency of aS-mediated transcript isoforms using the Kruskal-Wallis test-based algorithm and identify the representative aS-5'-leader. To further assign a representative 5'-end, we train the deep-learning model 5'leaderP to learn aTSS-mediated 5'-end distribution patterns from cap-analysis gene expression data. The model accurately predicts the 5'-end, confirmed experimentally in Arabidopsis and rice. The representative 5'-leader-contained gene models and 5'leaderP can be accessed at RNAirport (http://www.rnairport.com/leader5P/). The Stage 1 annotation of 5'-leader records 5'-leader diversity and will pave the way to Ribo-Seq open-reading frame annotation, identical to the project recently initiated by human GENCODE.</p>","PeriodicalId":54825,"journal":{"name":"Journal of Genetics and Genomics","volume":null,"pages":null},"PeriodicalIF":6.6000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Genetics and Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.jgg.2024.03.004","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/3/20 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
A 5'-leader, known initially as the 5'-untranslated region, contains multiple isoforms due to alternative splicing (aS) and alternative transcription start site (aTSS). Therefore, a representative 5'-leader is demanded to examine the embedded RNA regulatory elements in controlling translation efficiency. Here, we develop a ranking algorithm and a deep-learning model to annotate representative 5'-leaders for five plant species. We rank the intra-sample and inter-sample frequency of aS-mediated transcript isoforms using the Kruskal-Wallis test-based algorithm and identify the representative aS-5'-leader. To further assign a representative 5'-end, we train the deep-learning model 5'leaderP to learn aTSS-mediated 5'-end distribution patterns from cap-analysis gene expression data. The model accurately predicts the 5'-end, confirmed experimentally in Arabidopsis and rice. The representative 5'-leader-contained gene models and 5'leaderP can be accessed at RNAirport (http://www.rnairport.com/leader5P/). The Stage 1 annotation of 5'-leader records 5'-leader diversity and will pave the way to Ribo-Seq open-reading frame annotation, identical to the project recently initiated by human GENCODE.
期刊介绍:
The Journal of Genetics and Genomics (JGG, formerly known as Acta Genetica Sinica ) is an international journal publishing peer-reviewed articles of novel and significant discoveries in the fields of genetics and genomics. Topics of particular interest include but are not limited to molecular genetics, developmental genetics, cytogenetics, epigenetics, medical genetics, population and evolutionary genetics, genomics and functional genomics as well as bioinformatics and computational biology.