利用银河生物信息学平台中的蛋白质基因组工作流程发现新的蛋白质形式。

Q4 Biochemistry, Genetics and Molecular Biology
Praveen Kumar, James E Johnson, Thomas McGowan, Matthew C Chambers, Mohammad Heydarian, Subina Mehta, Caleb Easterly, Timothy J Griffin, Pratik D Jagtap
{"title":"利用银河生物信息学平台中的蛋白质基因组工作流程发现新的蛋白质形式。","authors":"Praveen Kumar, James E Johnson, Thomas McGowan, Matthew C Chambers, Mohammad Heydarian, Subina Mehta, Caleb Easterly, Timothy J Griffin, Pratik D Jagtap","doi":"10.1007/978-1-0716-4152-1_7","DOIUrl":null,"url":null,"abstract":"<p><p>Proteogenomics is a growing \"multi-omics\" research area that combines mass spectrometry-based proteomics and high-throughput nucleotide sequencing technologies. Proteogenomics has helped in genomic annotation for organisms whose complete genome sequences became available by using high-throughput DNA sequencing technologies. Apart from genome annotation, this multi-omics approach has also helped researchers confirm expression of variant proteins belonging to unique proteoforms that could have resulted from single-nucleotide polymorphism (SNP), insertion and deletions (Indels), splice isoforms, or other genome or transcriptome variations.A proteogenomic study depends on a multistep informatics workflow, requiring different software at each step. These integrated steps include creating an appropriate protein sequence database, matching spectral data against these sequences, and finally identifying peptide sequences corresponding to novel proteoforms followed by variant classification and functional analysis. The disparate software required for a proteogenomic study is difficult for most researchers to access and use, especially those lacking computational expertise. Furthermore, using them disjointedly can be error-prone as it requires setting up individual parameters for each software. Consequently, reproducibility suffers. Managing output files from each software is an additional challenge. One solution for these challenges in proteogenomics is the open-source Web-based computational platform Galaxy. Its capability to create and manage workflows comprised of disparate software while recording and saving all important parameters promotes both usability and reproducibility. Here, we describe a workflow that can perform proteogenomic analysis on a Galaxy-based platform. This Galaxy workflow facilitates matching of spectral data with a customized protein sequence database, identifying novel protein variants, assessing quality of results, and classifying variants along with visualization against the genome.</p>","PeriodicalId":18490,"journal":{"name":"Methods in molecular biology","volume":"2859 ","pages":"109-128"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discovering Novel Proteoforms Using Proteogenomic Workflows Within the Galaxy Bioinformatics Platform.\",\"authors\":\"Praveen Kumar, James E Johnson, Thomas McGowan, Matthew C Chambers, Mohammad Heydarian, Subina Mehta, Caleb Easterly, Timothy J Griffin, Pratik D Jagtap\",\"doi\":\"10.1007/978-1-0716-4152-1_7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Proteogenomics is a growing \\\"multi-omics\\\" research area that combines mass spectrometry-based proteomics and high-throughput nucleotide sequencing technologies. Proteogenomics has helped in genomic annotation for organisms whose complete genome sequences became available by using high-throughput DNA sequencing technologies. Apart from genome annotation, this multi-omics approach has also helped researchers confirm expression of variant proteins belonging to unique proteoforms that could have resulted from single-nucleotide polymorphism (SNP), insertion and deletions (Indels), splice isoforms, or other genome or transcriptome variations.A proteogenomic study depends on a multistep informatics workflow, requiring different software at each step. These integrated steps include creating an appropriate protein sequence database, matching spectral data against these sequences, and finally identifying peptide sequences corresponding to novel proteoforms followed by variant classification and functional analysis. The disparate software required for a proteogenomic study is difficult for most researchers to access and use, especially those lacking computational expertise. Furthermore, using them disjointedly can be error-prone as it requires setting up individual parameters for each software. Consequently, reproducibility suffers. Managing output files from each software is an additional challenge. One solution for these challenges in proteogenomics is the open-source Web-based computational platform Galaxy. Its capability to create and manage workflows comprised of disparate software while recording and saving all important parameters promotes both usability and reproducibility. Here, we describe a workflow that can perform proteogenomic analysis on a Galaxy-based platform. This Galaxy workflow facilitates matching of spectral data with a customized protein sequence database, identifying novel protein variants, assessing quality of results, and classifying variants along with visualization against the genome.</p>\",\"PeriodicalId\":18490,\"journal\":{\"name\":\"Methods in molecular biology\",\"volume\":\"2859 \",\"pages\":\"109-128\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Methods in molecular biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/978-1-0716-4152-1_7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Biochemistry, Genetics and Molecular Biology\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Methods in molecular biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/978-1-0716-4152-1_7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 0

摘要

蛋白质组学是一个不断发展的 "多组学 "研究领域,它结合了基于质谱的蛋白质组学和高通量核苷酸测序技术。通过使用高通量 DNA 测序技术,生物体的完整基因组序列已经可以获得,蛋白质组学有助于对这些生物体进行基因组注释。除了基因组注释外,这种多组学方法还帮助研究人员确认了属于独特蛋白形式的变异蛋白质的表达,这些变异蛋白质可能是由单核苷酸多态性(SNP)、插入和缺失(Indels)、剪接异构体或其他基因组或转录组变异引起的。这些综合步骤包括创建适当的蛋白质序列数据库,将光谱数据与这些序列进行匹配,最后确定与新型蛋白质形式相对应的肽序列,然后进行变异分类和功能分析。大多数研究人员,尤其是缺乏计算专业知识的研究人员,很难获得和使用蛋白质基因组研究所需的各种软件。此外,不连贯地使用这些软件也容易出错,因为需要为每个软件设置单独的参数。因此,可重复性受到影响。管理每个软件的输出文件也是一个额外的挑战。解决蛋白质组学中的这些难题的方法之一是基于网络的开源计算平台 Galaxy。它能够创建和管理由不同软件组成的工作流程,同时记录和保存所有重要参数,从而提高了可用性和可重复性。在此,我们介绍一种能在基于 Galaxy 的平台上进行蛋白质组分析的工作流程。这种 Galaxy 工作流程有助于将光谱数据与定制的蛋白质序列数据库相匹配,识别新的蛋白质变异,评估结果的质量,并根据基因组对变异进行可视化分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Discovering Novel Proteoforms Using Proteogenomic Workflows Within the Galaxy Bioinformatics Platform.

Proteogenomics is a growing "multi-omics" research area that combines mass spectrometry-based proteomics and high-throughput nucleotide sequencing technologies. Proteogenomics has helped in genomic annotation for organisms whose complete genome sequences became available by using high-throughput DNA sequencing technologies. Apart from genome annotation, this multi-omics approach has also helped researchers confirm expression of variant proteins belonging to unique proteoforms that could have resulted from single-nucleotide polymorphism (SNP), insertion and deletions (Indels), splice isoforms, or other genome or transcriptome variations.A proteogenomic study depends on a multistep informatics workflow, requiring different software at each step. These integrated steps include creating an appropriate protein sequence database, matching spectral data against these sequences, and finally identifying peptide sequences corresponding to novel proteoforms followed by variant classification and functional analysis. The disparate software required for a proteogenomic study is difficult for most researchers to access and use, especially those lacking computational expertise. Furthermore, using them disjointedly can be error-prone as it requires setting up individual parameters for each software. Consequently, reproducibility suffers. Managing output files from each software is an additional challenge. One solution for these challenges in proteogenomics is the open-source Web-based computational platform Galaxy. Its capability to create and manage workflows comprised of disparate software while recording and saving all important parameters promotes both usability and reproducibility. Here, we describe a workflow that can perform proteogenomic analysis on a Galaxy-based platform. This Galaxy workflow facilitates matching of spectral data with a customized protein sequence database, identifying novel protein variants, assessing quality of results, and classifying variants along with visualization against the genome.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Methods in molecular biology
Methods in molecular biology Biochemistry, Genetics and Molecular Biology-Genetics
CiteScore
2.00
自引率
0.00%
发文量
3536
期刊介绍: For over 20 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-by-step fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信