To Fly, or Not to Fly, That Is the Question: A Deep Learning Model for Peptide Detectability Prediction in Mass Spectrometry.

IF 3.8 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS

Journal of Proteome Research Pub Date : 2025-05-09 DOI:10.1021/acs.jproteome.4c00973

Naim Abdul-Khalek, Mario Picciani, Omar Shouman, Reinhard Wimmer, Michael Toft Overgaard, Mathias Wilhelm, Simon Gregersen Echers

{"title":"To Fly, or Not to Fly, That Is the Question: A Deep Learning Model for Peptide Detectability Prediction in Mass Spectrometry.","authors":"Naim Abdul-Khalek, Mario Picciani, Omar Shouman, Reinhard Wimmer, Michael Toft Overgaard, Mathias Wilhelm, Simon Gregersen Echers","doi":"10.1021/acs.jproteome.4c00973","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying detectable peptides, known as flyers, is key in mass spectrometry-based proteomics. Peptide detectability is strongly related to peptide sequences and their resulting physicochemical properties. Moreover, the high variability in MS data challenges the development of a generic model for detectability prediction, underlining the need for customizable tools. We present Pfly, a deep learning model developed to predict peptide detectability based solely on peptide sequence. Pfly is a versatile and reliable state-of-the-art tool, offering high performance, accessibility, and easy customizability for end-users. This adaptability allows researchers to tailor Pfly to specific experimental conditions, improving accuracy and expanding applicability across various research fields. Pfly is an encoder-decoder with an attention mechanism, classifying peptides as flyers or non-flyers, and providing both binary and categorical probabilities for four distinct classes defined in this study. The model was initially trained on a synthetic peptide library and subsequently fine-tuned with a biological dataset to mitigate bias toward synthesizability, improving predictive capacity and outperforming state-of-the-art predictors in benchmark comparisons across different human and cross-species datasets. The study further investigates the influence of protein abundance and rescoring, illustrating the negative impact on peptide identification due to misclassification. Pfly has been integrated into the DLOmix framework and is accessible on GitHub at https://github.com/wilhelm-lab/dlomix.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":""},"PeriodicalIF":3.8000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Proteome Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1021/acs.jproteome.4c00973","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Identifying detectable peptides, known as flyers, is key in mass spectrometry-based proteomics. Peptide detectability is strongly related to peptide sequences and their resulting physicochemical properties. Moreover, the high variability in MS data challenges the development of a generic model for detectability prediction, underlining the need for customizable tools. We present Pfly, a deep learning model developed to predict peptide detectability based solely on peptide sequence. Pfly is a versatile and reliable state-of-the-art tool, offering high performance, accessibility, and easy customizability for end-users. This adaptability allows researchers to tailor Pfly to specific experimental conditions, improving accuracy and expanding applicability across various research fields. Pfly is an encoder-decoder with an attention mechanism, classifying peptides as flyers or non-flyers, and providing both binary and categorical probabilities for four distinct classes defined in this study. The model was initially trained on a synthetic peptide library and subsequently fine-tuned with a biological dataset to mitigate bias toward synthesizability, improving predictive capacity and outperforming state-of-the-art predictors in benchmark comparisons across different human and cross-species datasets. The study further investigates the influence of protein abundance and rescoring, illustrating the negative impact on peptide identification due to misclassification. Pfly has been integrated into the DLOmix framework and is accessible on GitHub at https://github.com/wilhelm-lab/dlomix.

查看原文本刊更多论文

飞，还是不飞，这是一个问题：质谱中肽可检测性预测的深度学习模型。

识别可检测的多肽（称为飞链）是基于质谱的蛋白质组学的关键。肽的可检测性与肽序列及其产生的理化性质密切相关。此外，MS数据的高度可变性对可检测性预测的通用模型的开发提出了挑战，强调了对可定制工具的需求。我们提出了Pfly，这是一种深度学习模型，用于仅基于肽序列预测肽的可检测性。Pfly是一款多功能且可靠的先进工具，为最终用户提供高性能、可访问性和易于定制性。这种适应性使研究人员能够根据特定的实验条件定制Pfly，从而提高准确性并扩大在各个研究领域的适用性。Pfly是一个具有注意机制的编码器-解码器，将肽分类为飞行肽或非飞行肽，并提供本研究中定义的四种不同类别的二元和分类概率。该模型最初在合成肽库上进行训练，随后使用生物数据集进行微调，以减轻对可合成性的偏见，提高预测能力，并在不同人类和跨物种数据集的基准比较中优于最先进的预测器。本研究进一步探讨了蛋白质丰度和评分的影响，说明了错误分类对肽识别的负面影响。Pfly已经集成到DLOmix框架中，可以在GitHub上访问https://github.com/wilhelm-lab/dlomix。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Proteome Research 生物-生化研究方法

CiteScore

9.00

自引率

4.50%

发文量

251

审稿时长

3 months

期刊介绍： Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of "omics".