第15章。蛋白质基因组学:用于基因组注释的蛋白质组学

Proteome Informatics Pub Date : 2016-11-15 DOI:10.1039/9781782626732-00365

F. Ghali, A. Jones

{"title":"第15章。蛋白质基因组学:用于基因组注释的蛋白质组学","authors":"F. Ghali, A. Jones","doi":"10.1039/9781782626732-00365","DOIUrl":null,"url":null,"abstract":"One of major bottlenecks in omics biology is the generation of accurate gene models, including correct calling of the start codon, splicing of introns (taking account of alternative splicing), and the stop codon – collectively called genome annotation. Current genome annotation approaches for newly sequenced genomes are generally based on automated or semi-automated methods, usually involving gene finding software to look for intrinsic gene-like signatures (motifs) in the DNA sequence, the propagation of annotations from other (more well annotated) related species, and the mapping of experimental data sets, particularly from RNA Sequencing (RNA-Seq). Large scale proteomics data can also play an important role for confirming and correcting gene models. While proteomics approaches tend not to have the same level of sensitivity as RNA-Seq, they have the advantage that they can provide evidence that a predicted gene/transcript is indeed protein-coding. The use of proteomics data for genome annotation is called proteogenomics, and forms the basis for this chapter. We describe the theoretical underpinnings, different software packages that have been developed for proteogenomics, statistical approaches for validating the evidence, and support for proteogenomics data in file formats, standards and databases.","PeriodicalId":192946,"journal":{"name":"Proteome Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Chapter 15. Proteogenomics: Proteomics for Genome Annotation\",\"authors\":\"F. Ghali, A. Jones\",\"doi\":\"10.1039/9781782626732-00365\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of major bottlenecks in omics biology is the generation of accurate gene models, including correct calling of the start codon, splicing of introns (taking account of alternative splicing), and the stop codon – collectively called genome annotation. Current genome annotation approaches for newly sequenced genomes are generally based on automated or semi-automated methods, usually involving gene finding software to look for intrinsic gene-like signatures (motifs) in the DNA sequence, the propagation of annotations from other (more well annotated) related species, and the mapping of experimental data sets, particularly from RNA Sequencing (RNA-Seq). Large scale proteomics data can also play an important role for confirming and correcting gene models. While proteomics approaches tend not to have the same level of sensitivity as RNA-Seq, they have the advantage that they can provide evidence that a predicted gene/transcript is indeed protein-coding. The use of proteomics data for genome annotation is called proteogenomics, and forms the basis for this chapter. We describe the theoretical underpinnings, different software packages that have been developed for proteogenomics, statistical approaches for validating the evidence, and support for proteogenomics data in file formats, standards and databases.\",\"PeriodicalId\":192946,\"journal\":{\"name\":\"Proteome Informatics\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteome Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1039/9781782626732-00365\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteome Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1039/9781782626732-00365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

组学生物学的主要瓶颈之一是准确的基因模型的生成，包括正确调用起始密码子、内含子的剪接(考虑到可选剪接)和停止密码子-统称为基因组注释。当前新测序基因组的基因组注释方法通常基于自动化或半自动方法，通常涉及基因查找软件在DNA序列中寻找内在的基因样特征(基序)，传播来自其他(更充分注释的)相关物种的注释，以及实验数据集的映射，特别是来自RNA测序(RNA- seq)。大规模的蛋白质组学数据也可以在确认和纠正基因模型方面发挥重要作用。虽然蛋白质组学方法往往不具有与RNA-Seq相同的灵敏度水平，但它们的优势在于它们可以提供证据证明预测的基因/转录物确实是蛋白质编码。使用蛋白质组学数据进行基因组注释被称为蛋白质基因组学，并构成本章的基础。我们描述了蛋白质基因组学的理论基础、已开发的不同软件包、验证证据的统计方法，以及对文件格式、标准和数据库中蛋白质基因组学数据的支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Chapter 15. Proteogenomics: Proteomics for Genome Annotation

One of major bottlenecks in omics biology is the generation of accurate gene models, including correct calling of the start codon, splicing of introns (taking account of alternative splicing), and the stop codon – collectively called genome annotation. Current genome annotation approaches for newly sequenced genomes are generally based on automated or semi-automated methods, usually involving gene finding software to look for intrinsic gene-like signatures (motifs) in the DNA sequence, the propagation of annotations from other (more well annotated) related species, and the mapping of experimental data sets, particularly from RNA Sequencing (RNA-Seq). Large scale proteomics data can also play an important role for confirming and correcting gene models. While proteomics approaches tend not to have the same level of sensitivity as RNA-Seq, they have the advantage that they can provide evidence that a predicted gene/transcript is indeed protein-coding. The use of proteomics data for genome annotation is called proteogenomics, and forms the basis for this chapter. We describe the theoretical underpinnings, different software packages that have been developed for proteogenomics, statistical approaches for validating the evidence, and support for proteogenomics data in file formats, standards and databases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proteome Informatics

自引率

0.00%

发文量