{"title":"GraphPI: Efficient Protein Inference with Graph Neural Networks","authors":"Zheng Ma, Jiazhen Chen, Lei Xin and Ali Ghodsi*, ","doi":"10.1021/acs.jproteome.3c0084510.1021/acs.jproteome.3c00845","DOIUrl":null,"url":null,"abstract":"<p >The integration of deep learning approaches in biomedical research has been transformative, enabling breakthroughs in various applications. Despite these strides, its application in protein inference is impeded by the scarcity of extensively labeled data sets, a challenge compounded by the high costs and complexities of accurate protein annotation. In this study, we introduce GraphPI, a novel framework that treats protein inference as a node classification problem. We treat proteins as interconnected nodes within a protein–peptide–PSM graph, utilizing a graph neural network-based architecture to elucidate their interrelations. To address label scarcity, we train the model on a set of unlabeled public protein data sets with pseudolabels derived from an existing protein inference algorithm, enhanced by self-training to iteratively refine labels based on confidence scores. Contrary to prevalent methodologies necessitating data set-specific training, our research illustrates that GraphPI, due to the well-normalized nature of Percolator features, exhibits universal applicability without data set-specific fine-tuning, a feature that not only mitigates the risk of overfitting but also enhances computational efficiency. Our empirical experiments reveal notable performance on various test data sets and deliver significantly reduced computation times compared to common protein inference algorithms.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Proteome Research","FirstCategoryId":"99","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jproteome.3c00845","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The integration of deep learning approaches in biomedical research has been transformative, enabling breakthroughs in various applications. Despite these strides, its application in protein inference is impeded by the scarcity of extensively labeled data sets, a challenge compounded by the high costs and complexities of accurate protein annotation. In this study, we introduce GraphPI, a novel framework that treats protein inference as a node classification problem. We treat proteins as interconnected nodes within a protein–peptide–PSM graph, utilizing a graph neural network-based architecture to elucidate their interrelations. To address label scarcity, we train the model on a set of unlabeled public protein data sets with pseudolabels derived from an existing protein inference algorithm, enhanced by self-training to iteratively refine labels based on confidence scores. Contrary to prevalent methodologies necessitating data set-specific training, our research illustrates that GraphPI, due to the well-normalized nature of Percolator features, exhibits universal applicability without data set-specific fine-tuning, a feature that not only mitigates the risk of overfitting but also enhances computational efficiency. Our empirical experiments reveal notable performance on various test data sets and deliver significantly reduced computation times compared to common protein inference algorithms.
期刊介绍:
Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of "omics".