Chapter 15. Proteogenomics: Proteomics for Genome Annotation

F. Ghali, A. Jones
{"title":"Chapter 15. Proteogenomics: Proteomics for Genome Annotation","authors":"F. Ghali, A. Jones","doi":"10.1039/9781782626732-00365","DOIUrl":null,"url":null,"abstract":"One of major bottlenecks in omics biology is the generation of accurate gene models, including correct calling of the start codon, splicing of introns (taking account of alternative splicing), and the stop codon – collectively called genome annotation. Current genome annotation approaches for newly sequenced genomes are generally based on automated or semi-automated methods, usually involving gene finding software to look for intrinsic gene-like signatures (motifs) in the DNA sequence, the propagation of annotations from other (more well annotated) related species, and the mapping of experimental data sets, particularly from RNA Sequencing (RNA-Seq). Large scale proteomics data can also play an important role for confirming and correcting gene models. While proteomics approaches tend not to have the same level of sensitivity as RNA-Seq, they have the advantage that they can provide evidence that a predicted gene/transcript is indeed protein-coding. The use of proteomics data for genome annotation is called proteogenomics, and forms the basis for this chapter. We describe the theoretical underpinnings, different software packages that have been developed for proteogenomics, statistical approaches for validating the evidence, and support for proteogenomics data in file formats, standards and databases.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteome Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1039/9781782626732-00365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

One of major bottlenecks in omics biology is the generation of accurate gene models, including correct calling of the start codon, splicing of introns (taking account of alternative splicing), and the stop codon – collectively called genome annotation. Current genome annotation approaches for newly sequenced genomes are generally based on automated or semi-automated methods, usually involving gene finding software to look for intrinsic gene-like signatures (motifs) in the DNA sequence, the propagation of annotations from other (more well annotated) related species, and the mapping of experimental data sets, particularly from RNA Sequencing (RNA-Seq). Large scale proteomics data can also play an important role for confirming and correcting gene models. While proteomics approaches tend not to have the same level of sensitivity as RNA-Seq, they have the advantage that they can provide evidence that a predicted gene/transcript is indeed protein-coding. The use of proteomics data for genome annotation is called proteogenomics, and forms the basis for this chapter. We describe the theoretical underpinnings, different software packages that have been developed for proteogenomics, statistical approaches for validating the evidence, and support for proteogenomics data in file formats, standards and databases.
第15章。蛋白质基因组学:用于基因组注释的蛋白质组学
组学生物学的主要瓶颈之一是准确的基因模型的生成,包括正确调用起始密码子、内含子的剪接(考虑到可选剪接)和停止密码子-统称为基因组注释。当前新测序基因组的基因组注释方法通常基于自动化或半自动方法,通常涉及基因查找软件在DNA序列中寻找内在的基因样特征(基序),传播来自其他(更充分注释的)相关物种的注释,以及实验数据集的映射,特别是来自RNA测序(RNA- seq)。大规模的蛋白质组学数据也可以在确认和纠正基因模型方面发挥重要作用。虽然蛋白质组学方法往往不具有与RNA-Seq相同的灵敏度水平,但它们的优势在于它们可以提供证据证明预测的基因/转录物确实是蛋白质编码。使用蛋白质组学数据进行基因组注释被称为蛋白质基因组学,并构成本章的基础。我们描述了蛋白质基因组学的理论基础、已开发的不同软件包、验证证据的统计方法,以及对文件格式、标准和数据库中蛋白质基因组学数据的支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信