{"title":"Deciphering Cell Type Abundance in Proteomics Data Through Graph Neural Networks.","authors":"Zhiming Dai, Yujie Song, Tuoshi Qi, Hongyu Zhang, Huiying Zhao, Zheng Wang, Yuedong Yang, Yuansong Zeng","doi":"10.1002/advs.202502987","DOIUrl":null,"url":null,"abstract":"<p><p>Recent advancements in proteomics sequencing have significantly enhanced our ability to explore cell-type-specific signatures within complex tissues, providing critical insights into disease mechanisms. However, current proteomic technologies often suffer from low resolution, resulting in the mixing of multiple cell types during profiling. To address this limitation, cell-type deconvolution methods are developed to infer cellular composition from proteomic data. While most existing deconvolution methods are focused on transcriptomics, their application to proteomics is hindered by the weak correlation and divergent quantification between transcriptome and proteome data. Although a few proteomic-specific deconvolution methods are recently emerged, they still exhibit limited capability and performance, partly because they only extract shared information from individual samples while ignoring higher-order relationships between them. Here, GraphDEC is proposed, a novel graph neural network-based method for deciphering cell type proportions in proteomic profiling data. GraphDEC begins by simulating bulk samples from single-cell proteomic data to create reference data, which is then used to infer cell types in target datasets. Specifically, GraphDEC employs an autoencoder to extract low-dimensional representations from both reference and target proteomic data, enabling the construction of similarity relationships among samples. These relationships, combined with proteomic data, are processed by a graph neural network that integrates a multi-channel mechanism and a hybrid neighborhood-aware approach to learn highly effective representations. To optimize the model, GraphDEC utilizes multiple loss functions, including triplet loss, domain adaptation loss, and Mean Squared Error (MSE) loss, ensuring robust performance and mitigating batch effects. Benchmark experiments demonstrate that GraphDEC achieves state-of-the-art performance across diverse synthetic proteomic datasets from different sequencing technologies and real-world spatial proteomic datasets. Furthermore, GraphDEC exhibits strong generalization capabilities, showing high efficiency when applied to cross-species proteomic data and even transcriptomics.</p>","PeriodicalId":117,"journal":{"name":"Advanced Science","volume":" ","pages":"e02987"},"PeriodicalIF":14.3000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Science","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1002/advs.202502987","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advancements in proteomics sequencing have significantly enhanced our ability to explore cell-type-specific signatures within complex tissues, providing critical insights into disease mechanisms. However, current proteomic technologies often suffer from low resolution, resulting in the mixing of multiple cell types during profiling. To address this limitation, cell-type deconvolution methods are developed to infer cellular composition from proteomic data. While most existing deconvolution methods are focused on transcriptomics, their application to proteomics is hindered by the weak correlation and divergent quantification between transcriptome and proteome data. Although a few proteomic-specific deconvolution methods are recently emerged, they still exhibit limited capability and performance, partly because they only extract shared information from individual samples while ignoring higher-order relationships between them. Here, GraphDEC is proposed, a novel graph neural network-based method for deciphering cell type proportions in proteomic profiling data. GraphDEC begins by simulating bulk samples from single-cell proteomic data to create reference data, which is then used to infer cell types in target datasets. Specifically, GraphDEC employs an autoencoder to extract low-dimensional representations from both reference and target proteomic data, enabling the construction of similarity relationships among samples. These relationships, combined with proteomic data, are processed by a graph neural network that integrates a multi-channel mechanism and a hybrid neighborhood-aware approach to learn highly effective representations. To optimize the model, GraphDEC utilizes multiple loss functions, including triplet loss, domain adaptation loss, and Mean Squared Error (MSE) loss, ensuring robust performance and mitigating batch effects. Benchmark experiments demonstrate that GraphDEC achieves state-of-the-art performance across diverse synthetic proteomic datasets from different sequencing technologies and real-world spatial proteomic datasets. Furthermore, GraphDEC exhibits strong generalization capabilities, showing high efficiency when applied to cross-species proteomic data and even transcriptomics.
期刊介绍:
Advanced Science is a prestigious open access journal that focuses on interdisciplinary research in materials science, physics, chemistry, medical and life sciences, and engineering. The journal aims to promote cutting-edge research by employing a rigorous and impartial review process. It is committed to presenting research articles with the highest quality production standards, ensuring maximum accessibility of top scientific findings. With its vibrant and innovative publication platform, Advanced Science seeks to revolutionize the dissemination and organization of scientific knowledge.