Discovery and annotation of small proteins using genomics, proteomics, and computational approaches.

IF 5.3 2区 材料科学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY
ACS Applied Nano Materials Pub Date : 2011-04-01 Epub Date: 2011-03-02 DOI:10.1101/gr.109280.110
Xiaohan Yang, Timothy J Tschaplinski, Gregory B Hurst, Sara Jawdy, Paul E Abraham, Patricia K Lankford, Rachel M Adams, Manesh B Shah, Robert L Hettich, Erika Lindquist, Udaya C Kalluri, Lee E Gunter, Christa Pennacchio, Gerald A Tuskan
{"title":"Discovery and annotation of small proteins using genomics, proteomics, and computational approaches.","authors":"Xiaohan Yang,&nbsp;Timothy J Tschaplinski,&nbsp;Gregory B Hurst,&nbsp;Sara Jawdy,&nbsp;Paul E Abraham,&nbsp;Patricia K Lankford,&nbsp;Rachel M Adams,&nbsp;Manesh B Shah,&nbsp;Robert L Hettich,&nbsp;Erika Lindquist,&nbsp;Udaya C Kalluri,&nbsp;Lee E Gunter,&nbsp;Christa Pennacchio,&nbsp;Gerald A Tuskan","doi":"10.1101/gr.109280.110","DOIUrl":null,"url":null,"abstract":"<p><p>Small proteins (10-200 amino acids [aa] in length) encoded by short open reading frames (sORF) play important regulatory roles in various biological processes, including tumor progression, stress response, flowering, and hormone signaling. However, ab initio discovery of small proteins has been relatively overlooked. Recent advances in deep transcriptome sequencing make it possible to efficiently identify sORFs at the genome level. In this study, we obtained ~2.6 million expressed sequence tag (EST) reads from Populus deltoides leaf transcriptome and reconstructed full-length transcripts from the EST sequences. We identified an initial set of 12,852 sORFs encoding proteins of 10-200 aa in length. Three computational approaches were then used to enrich for bona fide protein-coding sORFs from the initial sORF set: (1) coding-potential prediction, (2) evolutionary conservation between P. deltoides and other plant species, and (3) gene family clustering within P. deltoides. As a result, a high-confidence sORF candidate set containing 1469 genes was obtained. Analysis of the protein domains, non-protein-coding RNA motifs, sequence length distribution, and protein mass spectrometry data supported this high-confidence sORF set. In the high-confidence sORF candidate set, known protein domains were identified in 1282 genes (higher-confidence sORF candidate set), out of which 611 genes, designated as highest-confidence candidate sORF set, were supported by proteomics data. Of the 611 highest-confidence candidate sORF genes, 56 were new to the current Populus genome annotation. This study not only demonstrates that there are potential sORF candidates to be annotated in sequenced genomes, but also presents an efficient strategy for discovery of sORFs in species with no genome annotation yet available.</p>","PeriodicalId":6,"journal":{"name":"ACS Applied Nano Materials","volume":" ","pages":"634-41"},"PeriodicalIF":5.3000,"publicationDate":"2011-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1101/gr.109280.110","citationCount":"109","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Nano Materials","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.109280.110","RegionNum":2,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2011/3/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 109

Abstract

Small proteins (10-200 amino acids [aa] in length) encoded by short open reading frames (sORF) play important regulatory roles in various biological processes, including tumor progression, stress response, flowering, and hormone signaling. However, ab initio discovery of small proteins has been relatively overlooked. Recent advances in deep transcriptome sequencing make it possible to efficiently identify sORFs at the genome level. In this study, we obtained ~2.6 million expressed sequence tag (EST) reads from Populus deltoides leaf transcriptome and reconstructed full-length transcripts from the EST sequences. We identified an initial set of 12,852 sORFs encoding proteins of 10-200 aa in length. Three computational approaches were then used to enrich for bona fide protein-coding sORFs from the initial sORF set: (1) coding-potential prediction, (2) evolutionary conservation between P. deltoides and other plant species, and (3) gene family clustering within P. deltoides. As a result, a high-confidence sORF candidate set containing 1469 genes was obtained. Analysis of the protein domains, non-protein-coding RNA motifs, sequence length distribution, and protein mass spectrometry data supported this high-confidence sORF set. In the high-confidence sORF candidate set, known protein domains were identified in 1282 genes (higher-confidence sORF candidate set), out of which 611 genes, designated as highest-confidence candidate sORF set, were supported by proteomics data. Of the 611 highest-confidence candidate sORF genes, 56 were new to the current Populus genome annotation. This study not only demonstrates that there are potential sORF candidates to be annotated in sequenced genomes, but also presents an efficient strategy for discovery of sORFs in species with no genome annotation yet available.

使用基因组学、蛋白质组学和计算方法发现和注释小蛋白质。
由短开放阅读框(sORF)编码的小蛋白(长度为10-200个氨基酸[aa])在多种生物过程中发挥重要的调节作用,包括肿瘤进展、应激反应、开花和激素信号传导。然而,小分子蛋白质的从头算发现相对被忽视了。深度转录组测序的最新进展使得在基因组水平上有效地鉴定sorf成为可能。在本研究中,我们从deltoides杨叶转录组中获得了约260万个EST序列,并从EST序列中重建了全长转录本。我们鉴定了12852个sorf的初始集合,编码长度为10-200 aa的蛋白质。然后利用三种计算方法从初始sORF集中富集真正的蛋白质编码sORF:(1)编码潜力预测;(2)三角角藻与其他植物物种之间的进化保守性;(3)三角角藻基因家族聚类。结果,获得了包含1469个基因的高置信度sORF候选集。蛋白质结构域、非蛋白质编码RNA基序、序列长度分布和蛋白质质谱分析数据支持这一高置信度的sORF集。在高置信度sORF候选集中,在1282个基因(高置信度sORF候选集)中鉴定出已知的蛋白质结构域,其中611个基因被蛋白质组学数据支持,被指定为最高置信度候选sORF集。在611个置信度最高的候选sORF基因中,有56个是目前杨树基因组注释的新基因。该研究不仅证明了在测序基因组中存在潜在的sORF候选物,而且为在尚未获得基因组注释的物种中发现sORF提供了一种有效的策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
8.30
自引率
3.40%
发文量
1601
期刊介绍: ACS Applied Nano Materials is an interdisciplinary journal publishing original research covering all aspects of engineering, chemistry, physics and biology relevant to applications of nanomaterials. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important applications of nanomaterials.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信