Deep Learning Enhances Precision of Citrullination Identification in Human and Plant Tissue Proteomes.

IF 6.1 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Molecular & Cellular Proteomics Pub Date : 2025-03-01 Epub Date: 2025-02-05 DOI:10.1016/j.mcpro.2025.100924
Wassim Gabriel, Rebecca Meelker González, Sophia Laposchan, Erik Riedel, Gönül Dündar, Brigitte Poppenberger, Mathias Wilhelm, Chien-Yun Lee
{"title":"Deep Learning Enhances Precision of Citrullination Identification in Human and Plant Tissue Proteomes.","authors":"Wassim Gabriel, Rebecca Meelker González, Sophia Laposchan, Erik Riedel, Gönül Dündar, Brigitte Poppenberger, Mathias Wilhelm, Chien-Yun Lee","doi":"10.1016/j.mcpro.2025.100924","DOIUrl":null,"url":null,"abstract":"<p><p>Citrullination is a critical yet understudied post-translational modification (PTM) implicated in various biological processes. Exploring its role in health and disease requires a comprehensive understanding of the prevalence of this PTM at a proteome-wide scale. Although mass spectrometry has enabled the identification of citrullination sites in complex biological samples, it faces significant challenges, including limited enrichment tools and a high rate of false positives due to the identical mass with deamidation (+0.9840 Da) and errors in monoisotopic ion selection. These issues often necessitate manual spectrum inspection, reducing throughput in large-scale studies. In this work, we present a novel data analysis pipeline that incorporates the deep learning model Prosit-Cit into the MS database search workflow to improve both the sensitivity and the precision of citrullination site identification. Prosit-Cit, an extension of the existing Prosit model, has been trained on ∼53,000 spectra from ∼2500 synthetic citrullinated peptides and provides precise predictions for chromatographic retention time and fragment ion intensities of both citrullinated and deamidated peptides. This enhances the accuracy of identification and reduces false positives. Our pipeline demonstrated high precision on the evaluation dataset, recovering the majority of known citrullination sites in human tissue proteomes and improving sensitivity by identifying up to 14 times more citrullinated sites. Sequence motif analysis revealed consistency with previously reported findings, validating the reliability of our approach. Furthermore, extending the pipeline to a tissue proteome dataset of the model plant Arabidopsis thaliana enabled the identification of ∼200 citrullination sites across 169 proteins from 30 tissues, representing the first large-scale citrullination mapping in plants. This pipeline can be seamlessly applied to existing proteomics datasets, offering a robust tool for advancing biological discoveries and deepening our understanding of protein citrullination across species.</p>","PeriodicalId":18712,"journal":{"name":"Molecular & Cellular Proteomics","volume":" ","pages":"100924"},"PeriodicalIF":6.1000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11925583/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular & Cellular Proteomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.mcpro.2025.100924","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/5 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Citrullination is a critical yet understudied post-translational modification (PTM) implicated in various biological processes. Exploring its role in health and disease requires a comprehensive understanding of the prevalence of this PTM at a proteome-wide scale. Although mass spectrometry has enabled the identification of citrullination sites in complex biological samples, it faces significant challenges, including limited enrichment tools and a high rate of false positives due to the identical mass with deamidation (+0.9840 Da) and errors in monoisotopic ion selection. These issues often necessitate manual spectrum inspection, reducing throughput in large-scale studies. In this work, we present a novel data analysis pipeline that incorporates the deep learning model Prosit-Cit into the MS database search workflow to improve both the sensitivity and the precision of citrullination site identification. Prosit-Cit, an extension of the existing Prosit model, has been trained on ∼53,000 spectra from ∼2500 synthetic citrullinated peptides and provides precise predictions for chromatographic retention time and fragment ion intensities of both citrullinated and deamidated peptides. This enhances the accuracy of identification and reduces false positives. Our pipeline demonstrated high precision on the evaluation dataset, recovering the majority of known citrullination sites in human tissue proteomes and improving sensitivity by identifying up to 14 times more citrullinated sites. Sequence motif analysis revealed consistency with previously reported findings, validating the reliability of our approach. Furthermore, extending the pipeline to a tissue proteome dataset of the model plant Arabidopsis thaliana enabled the identification of ∼200 citrullination sites across 169 proteins from 30 tissues, representing the first large-scale citrullination mapping in plants. This pipeline can be seamlessly applied to existing proteomics datasets, offering a robust tool for advancing biological discoveries and deepening our understanding of protein citrullination across species.

深度学习提高人类和植物组织蛋白质组中瓜氨酸化鉴定的精度。
瓜氨酸化是一个关键的但尚未充分研究的翻译后修饰(PTM)涉及各种生物过程。探索其在健康和疾病中的作用需要在蛋白质组范围内全面了解这种PTM的患病率。虽然质谱法已经能够在复杂的生物样品中鉴定瓜氨酸位点,但它面临着巨大的挑战,包括有限的富集工具和由于与脱酰胺相同的质量(+0.9840 Da)而导致的高假阳性率和单同位素离子选择的错误。这些问题通常需要人工光谱检测,从而降低了大规模研究的吞吐量。在这项工作中,我们提出了一种新的数据分析管道,将深度学习模型Prosit-Cit集成到MS数据库搜索工作流程中,以提高瓜氨酸位点识别的灵敏度和精度。Prosit- cit是现有Prosit模型的扩展,已经对来自~ 2,500个合成瓜氨酸化肽的~ 53,000个光谱进行了训练,并提供了瓜氨酸化和脱酰胺化肽的色谱保留时间和片段离子强度的精确预测。这提高了识别的准确性,减少了误报。我们的管道在评估数据集上显示出高精度,恢复了人类组织蛋白质组中大多数已知的瓜氨酸化位点,并通过识别高达14倍的瓜氨酸化位点提高了灵敏度。序列基序分析显示与先前报道的结果一致,验证了我们方法的可靠性。此外,将该管道扩展到模式植物拟南芥的组织蛋白质组数据集,能够鉴定来自30个组织的169种蛋白质中的约200个瓜氨酸化位点,这代表了植物中第一次大规模的瓜氨酸化定位。这个管道可以无缝地应用于现有的蛋白质组学数据集,为推进生物学发现和加深我们对跨物种蛋白质瓜氨酸化的理解提供了一个强大的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular & Cellular Proteomics
Molecular & Cellular Proteomics 生物-生化研究方法
CiteScore
11.50
自引率
4.30%
发文量
131
审稿时长
84 days
期刊介绍: The mission of MCP is to foster the development and applications of proteomics in both basic and translational research. MCP will publish manuscripts that report significant new biological or clinical discoveries underpinned by proteomic observations across all kingdoms of life. Manuscripts must define the biological roles played by the proteins investigated or their mechanisms of action. The journal also emphasizes articles that describe innovative new computational methods and technological advancements that will enable future discoveries. Manuscripts describing such approaches do not have to include a solution to a biological problem, but must demonstrate that the technology works as described, is reproducible and is appropriate to uncover yet unknown protein/proteome function or properties using relevant model systems or publicly available data. Scope: -Fundamental studies in biology, including integrative "omics" studies, that provide mechanistic insights -Novel experimental and computational technologies -Proteogenomic data integration and analysis that enable greater understanding of physiology and disease processes -Pathway and network analyses of signaling that focus on the roles of post-translational modifications -Studies of proteome dynamics and quality controls, and their roles in disease -Studies of evolutionary processes effecting proteome dynamics, quality and regulation -Chemical proteomics, including mechanisms of drug action -Proteomics of the immune system and antigen presentation/recognition -Microbiome proteomics, host-microbe and host-pathogen interactions, and their roles in health and disease -Clinical and translational studies of human diseases -Metabolomics to understand functional connections between genes, proteins and phenotypes
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信