计算卡萨诺沃消化酶偏差

IF 3.6 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Journal of Proteome Research Pub Date : 2024-08-30 DOI:10.1021/acs.jproteome.4c0042210.1021/acs.jproteome.4c00422

Carlo Melendez, Justin Sanders, Melih Yilmaz, Wout Bittremieux, William E. Fondrie, Sewoong Oh and William Stafford Noble*,

{"title":"计算卡萨诺沃消化酶偏差","authors":"Carlo Melendez, Justin Sanders, Melih Yilmaz, Wout Bittremieux, William E. Fondrie, Sewoong Oh and William Stafford Noble*, ","doi":"10.1021/acs.jproteome.4c0042210.1021/acs.jproteome.4c00422","DOIUrl":null,"url":null,"abstract":"<p >A key parameter of any bottom-up proteomics mass spectrometry experiment is the identity of the enzyme that is used to digest proteins in the sample into peptides. The Casanovo de novo sequencing model was trained using data that was generated with trypsin digestion; consequently, the model prefers to predict peptides that end with the amino acids “K” or “R\". This bias is desirable when Casanovo is used to analyze data that was also generated using trypsin but can be problematic if the data was generated using some other digestion enzyme. In this work, we modify Casanovo to take as input the identity of the digestion enzyme alongside each observed spectrum. We then train Casanovo with data generated by using several different enzymes, and we demonstrate that the resulting model successfully learns to capture enzyme-specific behavior. However, we find, surprisingly, that this new model does not yield a significant improvement in sequencing accuracy relative to a model trained without enzyme information but using the same training set. This observation may have important implications for future attempts to make use of experimental metadata in de novo sequencing models.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":"23 10","pages":"4761–4769 4761–4769"},"PeriodicalIF":3.6000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accounting for Digestion Enzyme Bias in Casanovo\",\"authors\":\"Carlo Melendez, Justin Sanders, Melih Yilmaz, Wout Bittremieux, William E. Fondrie, Sewoong Oh and William Stafford Noble*, \",\"doi\":\"10.1021/acs.jproteome.4c0042210.1021/acs.jproteome.4c00422\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >A key parameter of any bottom-up proteomics mass spectrometry experiment is the identity of the enzyme that is used to digest proteins in the sample into peptides. The Casanovo de novo sequencing model was trained using data that was generated with trypsin digestion; consequently, the model prefers to predict peptides that end with the amino acids “K” or “R\\\". This bias is desirable when Casanovo is used to analyze data that was also generated using trypsin but can be problematic if the data was generated using some other digestion enzyme. In this work, we modify Casanovo to take as input the identity of the digestion enzyme alongside each observed spectrum. We then train Casanovo with data generated by using several different enzymes, and we demonstrate that the resulting model successfully learns to capture enzyme-specific behavior. However, we find, surprisingly, that this new model does not yield a significant improvement in sequencing accuracy relative to a model trained without enzyme information but using the same training set. This observation may have important implications for future attempts to make use of experimental metadata in de novo sequencing models.</p>\",\"PeriodicalId\":48,\"journal\":{\"name\":\"Journal of Proteome Research\",\"volume\":\"23 10\",\"pages\":\"4761–4769 4761–4769\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Proteome Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00422\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Proteome Research","FirstCategoryId":"99","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00422","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

自下而上蛋白质组学质谱分析实验的一个关键参数是将样品中的蛋白质消化成肽的酶的特性。Casanovo从头测序模型是利用胰蛋白酶消化产生的数据进行训练的；因此，该模型更倾向于预测以氨基酸 "K "或 "R "结尾的肽段。当 Casanovo 用于分析同样使用胰蛋白酶生成的数据时，这种偏差是可取的，但如果数据是使用其他消化酶生成的，这种偏差就会产生问题。在这项工作中，我们对 Casanovo 进行了修改，将每个观察到的光谱旁边的消化酶的身份作为输入。然后，我们用使用几种不同酶生成的数据对 Casanovo 进行训练，结果表明，所生成的模型能成功地学习捕捉酶的特异性行为。然而，我们意外地发现，与没有酶信息但使用相同训练集训练的模型相比，这种新模型并没有显著提高测序准确性。这一观察结果可能会对未来在全新测序模型中使用实验元数据的尝试产生重要影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Accounting for Digestion Enzyme Bias in Casanovo

查看原文本刊更多论文

Accounting for Digestion Enzyme Bias in Casanovo

A key parameter of any bottom-up proteomics mass spectrometry experiment is the identity of the enzyme that is used to digest proteins in the sample into peptides. The Casanovo de novo sequencing model was trained using data that was generated with trypsin digestion; consequently, the model prefers to predict peptides that end with the amino acids “K” or “R". This bias is desirable when Casanovo is used to analyze data that was also generated using trypsin but can be problematic if the data was generated using some other digestion enzyme. In this work, we modify Casanovo to take as input the identity of the digestion enzyme alongside each observed spectrum. We then train Casanovo with data generated by using several different enzymes, and we demonstrate that the resulting model successfully learns to capture enzyme-specific behavior. However, we find, surprisingly, that this new model does not yield a significant improvement in sequencing accuracy relative to a model trained without enzyme information but using the same training set. This observation may have important implications for future attempts to make use of experimental metadata in de novo sequencing models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Proteome Research 生物-生化研究方法

CiteScore

9.00

自引率

4.50%

发文量

251

审稿时长

3 months

期刊介绍： Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of "omics".