Comprehensive Stress-Based De Novo Transcriptome Assembly and Annotation of Guar (Cyamopsis tetragonoloba (L.) Taub.): An Important Industrial and Forage Crop
F. Al-Qurainy, Aref Alshameri, A. Gaafar, Salim Khan, M. Nadeem, A. Alameri, M. Tarroum, M. Ashraf
{"title":"Comprehensive Stress-Based De Novo Transcriptome Assembly and Annotation of Guar (Cyamopsis tetragonoloba (L.) Taub.): An Important Industrial and Forage Crop","authors":"F. Al-Qurainy, Aref Alshameri, A. Gaafar, Salim Khan, M. Nadeem, A. Alameri, M. Tarroum, M. Ashraf","doi":"10.1155/2019/7295859","DOIUrl":null,"url":null,"abstract":"The forage crop Guar (Cyamopsis tetragonoloba (L.) Taub.) has the ability to endure heat, drought, and mild salinity. A complete image on its genic architecture will promote our understanding about gene expression networks and different tolerance mechanisms at the molecular level. Therefore, whole mRNA sequence approach on the Guar plant was conducted to provide a snapshot of the mRNA information in the cell under salinity, heat, and drought stresses to be integrated with previous transcriptomic studies. RNA-Seq technology was employed to perform a 2 × 100 paired-end sequencing using an Illumina HiSeq 2500 platform for the transcriptome of leaves of C. tetragonoloba under normal, heat, drought, and salinity conditions. Trinity was used to achieve a de novo assembly followed by gene annotation, functional classification, metabolic pathway analysis, and identification of SSR markers. A total of 218.2 million paired-end raw reads (~44 Gbp) were generated. Of those, 193.5M paired-end reads of high quality were used to reconstruct a total of 161,058 transcripts (~266 Mbp) with N50 of 2552 bp and 61,508 putative genes. There were 6463 proteins having >90% full-length coverage against the Swiss-Prot database and 94% complete orthologs against Embryophyta. Approximately, 62.87% of transcripts were blasted, 50.46% mapped, and 43.50% annotated. A total of 4715 InterProScan families, 3441 domains, 74 repeats, and 490 sites were detected. Biological processes, molecular functions, and cellular components comprised 64.12%, 25.42%, and 10.4%, respectively. The transcriptome was associated with 985 enzymes and 156 KEGG pathways. A total of 27,066 SSRs were gained with an average frequency of one SSR/9.825 kb in the assembled transcripts. This resulting data will be helpful for the advanced analysis of Guar to multi-stress tolerance.","PeriodicalId":13988,"journal":{"name":"International Journal of Genomics","volume":" ","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2019-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1155/2019/7295859","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1155/2019/7295859","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 24
Abstract
The forage crop Guar (Cyamopsis tetragonoloba (L.) Taub.) has the ability to endure heat, drought, and mild salinity. A complete image on its genic architecture will promote our understanding about gene expression networks and different tolerance mechanisms at the molecular level. Therefore, whole mRNA sequence approach on the Guar plant was conducted to provide a snapshot of the mRNA information in the cell under salinity, heat, and drought stresses to be integrated with previous transcriptomic studies. RNA-Seq technology was employed to perform a 2 × 100 paired-end sequencing using an Illumina HiSeq 2500 platform for the transcriptome of leaves of C. tetragonoloba under normal, heat, drought, and salinity conditions. Trinity was used to achieve a de novo assembly followed by gene annotation, functional classification, metabolic pathway analysis, and identification of SSR markers. A total of 218.2 million paired-end raw reads (~44 Gbp) were generated. Of those, 193.5M paired-end reads of high quality were used to reconstruct a total of 161,058 transcripts (~266 Mbp) with N50 of 2552 bp and 61,508 putative genes. There were 6463 proteins having >90% full-length coverage against the Swiss-Prot database and 94% complete orthologs against Embryophyta. Approximately, 62.87% of transcripts were blasted, 50.46% mapped, and 43.50% annotated. A total of 4715 InterProScan families, 3441 domains, 74 repeats, and 490 sites were detected. Biological processes, molecular functions, and cellular components comprised 64.12%, 25.42%, and 10.4%, respectively. The transcriptome was associated with 985 enzymes and 156 KEGG pathways. A total of 27,066 SSRs were gained with an average frequency of one SSR/9.825 kb in the assembled transcripts. This resulting data will be helpful for the advanced analysis of Guar to multi-stress tolerance.
期刊介绍:
International Journal of Genomics is a peer-reviewed, Open Access journal that publishes research articles as well as review articles in all areas of genome-scale analysis. Topics covered by the journal include, but are not limited to: bioinformatics, clinical genomics, disease genomics, epigenomics, evolutionary genomics, functional genomics, genome engineering, and synthetic genomics.