Ying Sun, W. L. Rogers, K. Devos, Liming Cai, R. Malmberg
{"title":"谷物长链非编码rna的全基因组鉴定与进化分析","authors":"Ying Sun, W. L. Rogers, K. Devos, Liming Cai, R. Malmberg","doi":"10.1109/ICCABS.2016.7802791","DOIUrl":null,"url":null,"abstract":"We identified lncRNA candidates in four economically important cereals (Poaceae): 7,196 in Zea mays, 1,974 in Sorghum bicolor, 4,236 in Setaria italica and 2,542 in Oryza sativa, using computational methods; we then compared these RNAs across the species. Our approach involved screening a reference-guided transcriptome assembly of RNA-Seq data for RNAs that were at least 200 bases in length with at most 70 amino acids in open reading frames and with a lack of homology in the Uniprot database. A sequence composition analysis of the lncRNA candidates, in comparison to protein-coding transcripts, highlighted distinctive features, including a low GC content, a paucity of introns and a hexamer usage bias, consistent with what has been found for mammalian lncRNAs. RepeatMasker identified from 1% (rice) to 19% (maize) of the candidate lncRNAs as being transcribed from transposable elements, based on a dataset with 3,853 transposable elements. We compared the candidate lncRNAs with 25,141 miRNAs from miRBase, and found that less than 1% of them could be potential miRNA precursors. The cross-species comparisons, which included a sequence- and structure-based lncRNA homology search, synteny analysis, and lncRNA secondary structure prediction, uncovered some limited sequence similarity. In sub-regions, we predicted conserved secondary structures using covariation analysis. We used the comparative sequence and synteny analyses to predict the existence of lncRNAs in S. italica; experimental tests confirmed the presence of these RNAs. Our results are consistent with a model of very rapid evolution of lncRNAs.","PeriodicalId":89933,"journal":{"name":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","volume":"12 1","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Genome-wide identification and evolutionary analysis of long non-coding RNAs in cereals\",\"authors\":\"Ying Sun, W. L. Rogers, K. Devos, Liming Cai, R. Malmberg\",\"doi\":\"10.1109/ICCABS.2016.7802791\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We identified lncRNA candidates in four economically important cereals (Poaceae): 7,196 in Zea mays, 1,974 in Sorghum bicolor, 4,236 in Setaria italica and 2,542 in Oryza sativa, using computational methods; we then compared these RNAs across the species. Our approach involved screening a reference-guided transcriptome assembly of RNA-Seq data for RNAs that were at least 200 bases in length with at most 70 amino acids in open reading frames and with a lack of homology in the Uniprot database. A sequence composition analysis of the lncRNA candidates, in comparison to protein-coding transcripts, highlighted distinctive features, including a low GC content, a paucity of introns and a hexamer usage bias, consistent with what has been found for mammalian lncRNAs. RepeatMasker identified from 1% (rice) to 19% (maize) of the candidate lncRNAs as being transcribed from transposable elements, based on a dataset with 3,853 transposable elements. We compared the candidate lncRNAs with 25,141 miRNAs from miRBase, and found that less than 1% of them could be potential miRNA precursors. The cross-species comparisons, which included a sequence- and structure-based lncRNA homology search, synteny analysis, and lncRNA secondary structure prediction, uncovered some limited sequence similarity. In sub-regions, we predicted conserved secondary structures using covariation analysis. We used the comparative sequence and synteny analyses to predict the existence of lncRNAs in S. italica; experimental tests confirmed the presence of these RNAs. Our results are consistent with a model of very rapid evolution of lncRNAs.\",\"PeriodicalId\":89933,\"journal\":{\"name\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"volume\":\"12 1\",\"pages\":\"1\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCABS.2016.7802791\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCABS.2016.7802791","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Genome-wide identification and evolutionary analysis of long non-coding RNAs in cereals
We identified lncRNA candidates in four economically important cereals (Poaceae): 7,196 in Zea mays, 1,974 in Sorghum bicolor, 4,236 in Setaria italica and 2,542 in Oryza sativa, using computational methods; we then compared these RNAs across the species. Our approach involved screening a reference-guided transcriptome assembly of RNA-Seq data for RNAs that were at least 200 bases in length with at most 70 amino acids in open reading frames and with a lack of homology in the Uniprot database. A sequence composition analysis of the lncRNA candidates, in comparison to protein-coding transcripts, highlighted distinctive features, including a low GC content, a paucity of introns and a hexamer usage bias, consistent with what has been found for mammalian lncRNAs. RepeatMasker identified from 1% (rice) to 19% (maize) of the candidate lncRNAs as being transcribed from transposable elements, based on a dataset with 3,853 transposable elements. We compared the candidate lncRNAs with 25,141 miRNAs from miRBase, and found that less than 1% of them could be potential miRNA precursors. The cross-species comparisons, which included a sequence- and structure-based lncRNA homology search, synteny analysis, and lncRNA secondary structure prediction, uncovered some limited sequence similarity. In sub-regions, we predicted conserved secondary structures using covariation analysis. We used the comparative sequence and synteny analyses to predict the existence of lncRNAs in S. italica; experimental tests confirmed the presence of these RNAs. Our results are consistent with a model of very rapid evolution of lncRNAs.