Deciphering Cell Type Abundance in Proteomics Data Through Graph Neural Networks.

IF 14.3 1区材料科学 Q1 CHEMISTRY, MULTIDISCIPLINARY

Advanced Science Pub Date : 2025-06-20 DOI:10.1002/advs.202502987

Zhiming Dai, Yujie Song, Tuoshi Qi, Hongyu Zhang, Huiying Zhao, Zheng Wang, Yuedong Yang, Yuansong Zeng

{"title":"Deciphering Cell Type Abundance in Proteomics Data Through Graph Neural Networks.","authors":"Zhiming Dai, Yujie Song, Tuoshi Qi, Hongyu Zhang, Huiying Zhao, Zheng Wang, Yuedong Yang, Yuansong Zeng","doi":"10.1002/advs.202502987","DOIUrl":null,"url":null,"abstract":"<p><p>Recent advancements in proteomics sequencing have significantly enhanced our ability to explore cell-type-specific signatures within complex tissues, providing critical insights into disease mechanisms. However, current proteomic technologies often suffer from low resolution, resulting in the mixing of multiple cell types during profiling. To address this limitation, cell-type deconvolution methods are developed to infer cellular composition from proteomic data. While most existing deconvolution methods are focused on transcriptomics, their application to proteomics is hindered by the weak correlation and divergent quantification between transcriptome and proteome data. Although a few proteomic-specific deconvolution methods are recently emerged, they still exhibit limited capability and performance, partly because they only extract shared information from individual samples while ignoring higher-order relationships between them. Here, GraphDEC is proposed, a novel graph neural network-based method for deciphering cell type proportions in proteomic profiling data. GraphDEC begins by simulating bulk samples from single-cell proteomic data to create reference data, which is then used to infer cell types in target datasets. Specifically, GraphDEC employs an autoencoder to extract low-dimensional representations from both reference and target proteomic data, enabling the construction of similarity relationships among samples. These relationships, combined with proteomic data, are processed by a graph neural network that integrates a multi-channel mechanism and a hybrid neighborhood-aware approach to learn highly effective representations. To optimize the model, GraphDEC utilizes multiple loss functions, including triplet loss, domain adaptation loss, and Mean Squared Error (MSE) loss, ensuring robust performance and mitigating batch effects. Benchmark experiments demonstrate that GraphDEC achieves state-of-the-art performance across diverse synthetic proteomic datasets from different sequencing technologies and real-world spatial proteomic datasets. Furthermore, GraphDEC exhibits strong generalization capabilities, showing high efficiency when applied to cross-species proteomic data and even transcriptomics.</p>","PeriodicalId":117,"journal":{"name":"Advanced Science","volume":" ","pages":"e02987"},"PeriodicalIF":14.3000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Science","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1002/advs.202502987","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advancements in proteomics sequencing have significantly enhanced our ability to explore cell-type-specific signatures within complex tissues, providing critical insights into disease mechanisms. However, current proteomic technologies often suffer from low resolution, resulting in the mixing of multiple cell types during profiling. To address this limitation, cell-type deconvolution methods are developed to infer cellular composition from proteomic data. While most existing deconvolution methods are focused on transcriptomics, their application to proteomics is hindered by the weak correlation and divergent quantification between transcriptome and proteome data. Although a few proteomic-specific deconvolution methods are recently emerged, they still exhibit limited capability and performance, partly because they only extract shared information from individual samples while ignoring higher-order relationships between them. Here, GraphDEC is proposed, a novel graph neural network-based method for deciphering cell type proportions in proteomic profiling data. GraphDEC begins by simulating bulk samples from single-cell proteomic data to create reference data, which is then used to infer cell types in target datasets. Specifically, GraphDEC employs an autoencoder to extract low-dimensional representations from both reference and target proteomic data, enabling the construction of similarity relationships among samples. These relationships, combined with proteomic data, are processed by a graph neural network that integrates a multi-channel mechanism and a hybrid neighborhood-aware approach to learn highly effective representations. To optimize the model, GraphDEC utilizes multiple loss functions, including triplet loss, domain adaptation loss, and Mean Squared Error (MSE) loss, ensuring robust performance and mitigating batch effects. Benchmark experiments demonstrate that GraphDEC achieves state-of-the-art performance across diverse synthetic proteomic datasets from different sequencing technologies and real-world spatial proteomic datasets. Furthermore, GraphDEC exhibits strong generalization capabilities, showing high efficiency when applied to cross-species proteomic data and even transcriptomics.

查看原文本刊更多论文

通过图神经网络破译蛋白质组学数据中的细胞类型丰度。

蛋白质组学测序的最新进展显著增强了我们在复杂组织中探索细胞类型特异性特征的能力，为疾病机制提供了重要的见解。然而，目前的蛋白质组学技术往往存在分辨率低的问题，导致在分析过程中混合了多种细胞类型。为了解决这一限制，开发了细胞型反卷积方法来从蛋白质组学数据推断细胞组成。虽然大多数现有的反褶积方法都集中在转录组学上，但转录组学和蛋白质组学数据之间的弱相关性和定量差异阻碍了它们在蛋白质组学中的应用。尽管最近出现了一些蛋白质组特异性反卷积方法，但它们仍然表现出有限的能力和性能，部分原因是它们只从单个样本中提取共享信息，而忽略了它们之间的高阶关系。本文提出了一种新的基于图神经网络的方法GraphDEC，用于破译蛋白质组学分析数据中的细胞类型比例。GraphDEC首先从单细胞蛋白质组学数据中模拟大量样本来创建参考数据，然后用于推断目标数据集中的细胞类型。具体来说，GraphDEC采用自编码器从参考和目标蛋白质组学数据中提取低维表示，从而能够在样本之间构建相似关系。这些关系与蛋白质组学数据相结合，通过图形神经网络进行处理，该网络集成了多通道机制和混合邻域感知方法，以学习高效的表示。为了优化模型，GraphDEC使用了多种损失函数，包括三重态损失、域自适应损失和均方误差（MSE）损失，以确保鲁棒性能并减轻批处理影响。基准实验表明，GraphDEC在不同测序技术和现实世界空间蛋白质组学数据集的合成蛋白质组学数据集上实现了最先进的性能。此外，GraphDEC具有很强的泛化能力，应用于跨物种蛋白质组学甚至转录组学数据时效率很高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advanced Science CHEMISTRY, MULTIDISCIPLINARYNANOSCIENCE &-NANOSCIENCE & NANOTECHNOLOGY

CiteScore

18.90

自引率

2.60%

发文量

1602

审稿时长

1.9 months

期刊介绍： Advanced Science is a prestigious open access journal that focuses on interdisciplinary research in materials science, physics, chemistry, medical and life sciences, and engineering. The journal aims to promote cutting-edge research by employing a rigorous and impartial review process. It is committed to presenting research articles with the highest quality production standards, ensuring maximum accessibility of top scientific findings. With its vibrant and innovative publication platform, Advanced Science seeks to revolutionize the dissemination and organization of scientific knowledge.