Imputing abundance of over 2,500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles.

Cell systems Pub Date : 2024-09-18 Epub Date: 2024-09-06 DOI:10.1016/j.cels.2024.08.006

Ruoqiao Chen, Jiayu Zhou, Bin Chen

{"title":"Imputing abundance of over 2,500 surface proteins from single-cell transcriptomes with context-agnostic zero-shot deep ensembles.","authors":"Ruoqiao Chen, Jiayu Zhou, Bin Chen","doi":"10.1016/j.cels.2024.08.006","DOIUrl":null,"url":null,"abstract":"<p><p>Cell surface proteins serve as primary drug targets and cell identity markers. Techniques such as CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) have enabled the simultaneous quantification of surface protein abundance and transcript expression within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability of these computational approaches across diverse contexts (e.g., different tissues/disease states) impede their widespread adoption. Here, we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA sequencing), a context-agnostic zero-shot deep ensemble model, which enables large-scale protein abundance prediction and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2,500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER, including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer. A record of this paper's transparent peer review process is included in the supplemental information.</p>","PeriodicalId":93929,"journal":{"name":"Cell systems","volume":" ","pages":"869-884.e6"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423933/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cell systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.cels.2024.08.006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/6 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Cell surface proteins serve as primary drug targets and cell identity markers. Techniques such as CITE-seq (cellular indexing of transcriptomes and epitopes by sequencing) have enabled the simultaneous quantification of surface protein abundance and transcript expression within individual cells. The published data have been utilized to train machine learning models for predicting surface protein abundance solely from transcript expression. However, the small scale of proteins predicted and the poor generalization ability of these computational approaches across diverse contexts (e.g., different tissues/disease states) impede their widespread adoption. Here, we propose SPIDER (surface protein prediction using deep ensembles from single-cell RNA sequencing), a context-agnostic zero-shot deep ensemble model, which enables large-scale protein abundance prediction and generalizes better to various contexts. Comprehensive benchmarking shows that SPIDER outperforms other state-of-the-art methods. Using the predicted surface abundance of >2,500 proteins from single-cell transcriptomes, we demonstrate the broad applications of SPIDER, including cell type annotation, biomarker/target identification, and cell-cell interaction analysis in hepatocellular carcinoma and colorectal cancer. A record of this paper's transparent peer review process is included in the supplemental information.

查看原文本刊更多论文

利用上下文无关的零点深度集合，从单细胞转录组中推算出 2,500 多种表面蛋白质的丰度。

细胞表面蛋白是主要的药物靶标和细胞身份标记。CITE-seq（通过测序对转录组和表位进行细胞索引）等技术实现了对单个细胞内表面蛋白丰度和转录物表达的同时量化。已发表的数据被用来训练机器学习模型，以便仅从转录本表达预测表面蛋白丰度。然而，由于预测的蛋白质规模较小，而且这些计算方法在不同环境（如不同组织/疾病状态）下的泛化能力较差，这阻碍了它们的广泛应用。在这里，我们提出了 SPIDER（利用单细胞 RNA 测序的深度集合进行表面蛋白质预测），这是一种与上下文无关的零次深度集合模型，它能进行大规模蛋白质丰度预测，并能更好地泛化到各种上下文中。综合基准测试表明，SPIDER优于其他最先进的方法。通过预测单细胞转录组中超过2500个蛋白质的表面丰度，我们展示了SPIDER的广泛应用，包括肝癌和结直肠癌的细胞类型注释、生物标记物/靶标识别以及细胞-细胞相互作用分析。补充信息中包含了本文透明的同行评审过程记录。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cell systems

自引率

0.00%

发文量