Finding haplotypic signatures in proteins

IF 11.8 2区 生物学 Q1 MULTIDISCIPLINARY SCIENCES
J. Vašíček, Dafni Skiadopoulou, K. Kuznetsova, Bo Wen, S. Johansson, P. Njølstad, Stefan Bruckner, L. Käll, Marc Vaudel
{"title":"Finding haplotypic signatures in proteins","authors":"J. Vašíček, Dafni Skiadopoulou, K. Kuznetsova, Bo Wen, S. Johansson, P. Njølstad, Stefan Bruckner, L. Käll, Marc Vaudel","doi":"10.1101/2022.11.21.517096","DOIUrl":null,"url":null,"abstract":"The non-random distribution of alleles of common genomic variants produces haplotypes, which are fundamental in medical and population genetic studies. Consequently, protein-coding genes with different co-occurring sets of alleles can encode different amino acid sequences: protein haplotypes. These protein haplotypes are present in biological samples, and detectable by mass spectrometry, but are not accounted for in proteomic searches. Consequently, the impact of haplotypic variation on the results of proteomic searches, and the discoverability of peptides specific to haplotypes remain unknown. Here, we study how common genetic haplotypes influence the proteomic search space and investigate the possibility to match peptides containing multiple amino acid substitutions to a publicly available data set of mass spectra. We found that for 9.96 % of the discoverable amino acid substitutions encoded by common haplotypes, two or more substitutions may co-occur in the same peptide after tryptic digestion of the protein haplotypes. We identified 342 spectra that matched to such multi-variant peptides, and out of the 4,251 amino acid substitutions identified, 6.63 % were covered by multi-variant peptides. However, the evaluation of the reliability of these matches remains challenging, suggesting that refined error rate estimation procedures are needed for such complex proteomic searches. As these become available and the ability to analyze protein haplotypes increases, we anticipate that proteomics will provide new information on the consequences of common variation, across tissues and time.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":null,"pages":null},"PeriodicalIF":11.8000,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"GigaScience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/2022.11.21.517096","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

The non-random distribution of alleles of common genomic variants produces haplotypes, which are fundamental in medical and population genetic studies. Consequently, protein-coding genes with different co-occurring sets of alleles can encode different amino acid sequences: protein haplotypes. These protein haplotypes are present in biological samples, and detectable by mass spectrometry, but are not accounted for in proteomic searches. Consequently, the impact of haplotypic variation on the results of proteomic searches, and the discoverability of peptides specific to haplotypes remain unknown. Here, we study how common genetic haplotypes influence the proteomic search space and investigate the possibility to match peptides containing multiple amino acid substitutions to a publicly available data set of mass spectra. We found that for 9.96 % of the discoverable amino acid substitutions encoded by common haplotypes, two or more substitutions may co-occur in the same peptide after tryptic digestion of the protein haplotypes. We identified 342 spectra that matched to such multi-variant peptides, and out of the 4,251 amino acid substitutions identified, 6.63 % were covered by multi-variant peptides. However, the evaluation of the reliability of these matches remains challenging, suggesting that refined error rate estimation procedures are needed for such complex proteomic searches. As these become available and the ability to analyze protein haplotypes increases, we anticipate that proteomics will provide new information on the consequences of common variation, across tissues and time.
在蛋白质中发现单倍型特征
普通基因组变异的等位基因的非随机分布产生单倍型,这是医学和群体遗传研究的基础。因此,具有不同共发生等位基因的蛋白质编码基因可以编码不同的氨基酸序列:蛋白质单倍型。这些蛋白质单倍型存在于生物样品中,可以通过质谱法检测到,但在蛋白质组学搜索中没有考虑到。因此,单倍型变异对蛋白质组学搜索结果的影响,以及单倍型特异性肽的可发现性仍然未知。在这里,我们研究了常见的遗传单倍型如何影响蛋白质组学搜索空间,并研究了将含有多个氨基酸取代的肽与公开可用的质谱数据集相匹配的可能性。我们发现,在9.96%的由普通单倍型编码的氨基酸替换中,经过蛋白质单倍型的胰蛋白酶消化后,同一肽可能同时发生两个或两个以上的替换。我们鉴定了342个与这些多变异肽相匹配的光谱,在鉴定的4251个氨基酸取代中,6.63%被多变异肽覆盖。然而,评估这些匹配的可靠性仍然具有挑战性,这表明需要改进错误率估计程序来进行这种复杂的蛋白质组学搜索。随着这些技术的发展和分析蛋白质单倍型的能力的提高,我们预计蛋白质组学将为跨组织和跨时间的共同变异的后果提供新的信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
GigaScience
GigaScience MULTIDISCIPLINARY SCIENCES-
CiteScore
15.50
自引率
1.10%
发文量
119
审稿时长
1 weeks
期刊介绍: GigaScience seeks to transform data dissemination and utilization in the life and biomedical sciences. As an online open-access open-data journal, it specializes in publishing "big-data" studies encompassing various fields. Its scope includes not only "omic" type data and the fields of high-throughput biology currently serviced by large public repositories, but also the growing range of more difficult-to-access data, such as imaging, neuroscience, ecology, cohort data, systems biology and other new types of large-scale shareable data.
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信