ProtGraph：一个工具，用于快速和全面的探索和利用肽搜索空间衍生的蛋白质序列数据库使用图形。

IF 6.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Briefings in bioinformatics Pub Date : 2024-11-22 DOI:10.1093/bib/bbae671

Dominik Lux, Katrin Marcus-Alic, Martin Eisenacher, Julian Uszkoreit

{"title":"ProtGraph：一个工具，用于快速和全面的探索和利用肽搜索空间衍生的蛋白质序列数据库使用图形。","authors":"Dominik Lux, Katrin Marcus-Alic, Martin Eisenacher, Julian Uszkoreit","doi":"10.1093/bib/bbae671","DOIUrl":null,"url":null,"abstract":"Due to computational resource limitations, in mass spectrometry based proteomics only a limited set of peptide sequences is used for the matching against measured spectra. We present an approach to represent proteins by graphs and allow not only the canonical sequences but also known isoforms and annotated amino acid variations, e.g. originating from genomic mutations, and further common protein sequence features contained in Uniprot KB or other protein databases. Our C++ and Python implementation enables a groundbreaking comprehensive characterization of the peptide search space, encompassing for the first time all available annotations in a protein database (in combination more than $10^{200}$ possibilities). Additionally, it can be used to quickly extract the relevant subset of the search space for peptide to spectrum matching, e.g. filtering by the peptide mass. We demonstrate the advantages and innovative findings of our implementation compared to previous workflows by re-analysing publicly available datasets.","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ProtGraph: a tool for the quick and comprehensive exploration and exploitation of the peptide search space derived from protein sequence databases using graphs.\",\"authors\":\"Dominik Lux, Katrin Marcus-Alic, Martin Eisenacher, Julian Uszkoreit\",\"doi\":\"10.1093/bib/bbae671\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to computational resource limitations, in mass spectrometry based proteomics only a limited set of peptide sequences is used for the matching against measured spectra. We present an approach to represent proteins by graphs and allow not only the canonical sequences but also known isoforms and annotated amino acid variations, e.g. originating from genomic mutations, and further common protein sequence features contained in Uniprot KB or other protein databases. Our C++ and Python implementation enables a groundbreaking comprehensive characterization of the peptide search space, encompassing for the first time all available annotations in a protein database (in combination more than $10^{200}$ possibilities). Additionally, it can be used to quickly extract the relevant subset of the search space for peptide to spectrum matching, e.g. filtering by the peptide mass. We demonstrate the advantages and innovative findings of our implementation compared to previous workflows by re-analysing publicly available datasets.\",\"PeriodicalId\":9209,\"journal\":{\"name\":\"Briefings in bioinformatics\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Briefings in bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/bib/bbae671\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae671","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

由于计算资源的限制，在基于质谱的蛋白质组学中，只有一组有限的肽序列用于与测量光谱的匹配。我们提出了一种用图形表示蛋白质的方法，不仅允许规范序列，还允许已知的同型异构体和注释的氨基酸变异，例如源自基因组突变，以及包含在Uniprot KB或其他蛋白质数据库中的进一步常见蛋白质序列特征。我们的c++和Python实现实现了突破性的肽搜索空间的全面表征，首次包含了蛋白质数据库中所有可用的注释（组合超过10^{200}$的可能性）。此外，它还可以用于快速提取肽搜索空间的相关子集以进行谱匹配，例如通过肽质量进行过滤。通过重新分析公开可用的数据集，我们展示了与以前的工作流程相比，我们实现的优势和创新发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ProtGraph: a tool for the quick and comprehensive exploration and exploitation of the peptide search space derived from protein sequence databases using graphs.

Due to computational resource limitations, in mass spectrometry based proteomics only a limited set of peptide sequences is used for the matching against measured spectra. We present an approach to represent proteins by graphs and allow not only the canonical sequences but also known isoforms and annotated amino acid variations, e.g. originating from genomic mutations, and further common protein sequence features contained in Uniprot KB or other protein databases. Our C++ and Python implementation enables a groundbreaking comprehensive characterization of the peptide search space, encompassing for the first time all available annotations in a protein database (in combination more than $10^{200}$ possibilities). Additionally, it can be used to quickly extract the relevant subset of the search space for peptide to spectrum matching, e.g. filtering by the peptide mass. We demonstrate the advantages and innovative findings of our implementation compared to previous workflows by re-analysing publicly available datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Briefings in bioinformatics 生物-生化研究方法

CiteScore

13.20

自引率

13.70%

发文量

549

审稿时长

6 months

期刊介绍： Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.