Metacell-based differential expression analysis identifies cell type specific temporal gene response programs in COVID-19 patient PBMCs

IF 3.5 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Kevin O’Leary, Deyou Zheng
{"title":"Metacell-based differential expression analysis identifies cell type specific temporal gene response programs in COVID-19 patient PBMCs","authors":"Kevin O’Leary, Deyou Zheng","doi":"10.1038/s41540-024-00364-2","DOIUrl":null,"url":null,"abstract":"<p>By profiling gene expression in individual cells, single-cell RNA-sequencing (scRNA-seq) can resolve cellular heterogeneity and cell-type gene expression dynamics. Its application to time-series samples can identify temporal gene programs active in different cell types, for example, immune cells’ responses to viral infection. However, current scRNA-seq analysis has limitations. One is the low number of genes detected per cell. The second is insufficient replicates (often 1-2) due to high experimental cost. The third lies in the data analysis—treating individual cells as independent measurements leads to inflated statistics. To address these, we explore a new computational framework, specifically whether “metacells” constructed to maintain cellular heterogeneity within individual cell types (or clusters) can be used as “replicates” for increasing statistical rigor. Toward this, we applied SEACells to a time-series scRNA-seq dataset from peripheral blood mononuclear cells (PBMCs) after SARS-CoV-2 infection to construct metacells, and used them in maSigPro for quadratic regression to find significantly differentially expressed genes (DEGs) over time, followed by clustering expression velocity trends. We showed that such metacells retained greater expression variances and produced more biologically meaningful DEGs compared to either metacells generated randomly or from simple pseudobulk methods. More specifically, this approach correctly identified the known ISG15 interferon response program in almost all PBMC cell types and many DEGs enriched in the previously defined SARS-CoV-2 infection response pathway. It also uncovered additional and more cell type-specific temporal gene expression programs. Overall, our results demonstrate that the metacell-pseudoreplicate strategy could potentially overcome the limitation of 1-2 replicates.</p>","PeriodicalId":19345,"journal":{"name":"NPJ Systems Biology and Applications","volume":null,"pages":null},"PeriodicalIF":3.5000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Systems Biology and Applications","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s41540-024-00364-2","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

By profiling gene expression in individual cells, single-cell RNA-sequencing (scRNA-seq) can resolve cellular heterogeneity and cell-type gene expression dynamics. Its application to time-series samples can identify temporal gene programs active in different cell types, for example, immune cells’ responses to viral infection. However, current scRNA-seq analysis has limitations. One is the low number of genes detected per cell. The second is insufficient replicates (often 1-2) due to high experimental cost. The third lies in the data analysis—treating individual cells as independent measurements leads to inflated statistics. To address these, we explore a new computational framework, specifically whether “metacells” constructed to maintain cellular heterogeneity within individual cell types (or clusters) can be used as “replicates” for increasing statistical rigor. Toward this, we applied SEACells to a time-series scRNA-seq dataset from peripheral blood mononuclear cells (PBMCs) after SARS-CoV-2 infection to construct metacells, and used them in maSigPro for quadratic regression to find significantly differentially expressed genes (DEGs) over time, followed by clustering expression velocity trends. We showed that such metacells retained greater expression variances and produced more biologically meaningful DEGs compared to either metacells generated randomly or from simple pseudobulk methods. More specifically, this approach correctly identified the known ISG15 interferon response program in almost all PBMC cell types and many DEGs enriched in the previously defined SARS-CoV-2 infection response pathway. It also uncovered additional and more cell type-specific temporal gene expression programs. Overall, our results demonstrate that the metacell-pseudoreplicate strategy could potentially overcome the limitation of 1-2 replicates.

基于元细胞的差异表达分析确定了 COVID-19 患者 PBMC 中特定细胞类型的时间基因反应程序
通过分析单个细胞的基因表达,单细胞 RNA 序列(scRNA-seq)可以解析细胞的异质性和细胞类型的基因表达动态。它在时间序列样本中的应用可以确定不同细胞类型中活跃的时间基因程序,例如免疫细胞对病毒感染的反应。然而,目前的 scRNA-seq 分析有其局限性。其一是每个细胞检测到的基因数量较少。其二是由于实验成本高,重复次数不足(通常为 1-2 次)。第三个限制在于数据分析--将单个细胞作为独立的测量值会导致统计数据膨胀。为了解决这些问题,我们探索了一种新的计算框架,特别是为保持单个细胞类型(或细胞簇)内的细胞异质性而构建的 "元细胞 "能否用作 "重复",以提高统计的严谨性。为此,我们将 SEACells 应用于 SARS-CoV-2 感染后外周血单核细胞(PBMCs)的时间序列 scRNA-seq 数据集,以构建元胞,并在 maSigPro 中使用元胞进行二次回归,以发现随时间变化的显著差异表达基因(DEGs),然后对表达速度趋势进行聚类。我们的研究表明,与随机生成的元胞或简单的伪群体方法相比,这种元胞保留了更大的表达方差,并产生了更多有生物学意义的 DEGs。更具体地说,这种方法正确鉴定了几乎所有 PBMC 细胞类型中已知的 ISG15 干扰素反应程序,以及先前定义的 SARS-CoV-2 感染反应途径中的许多 DEGs。它还发现了更多细胞类型特异性更强的时间基因表达程序。总之,我们的研究结果表明,元细胞-伪复本策略有可能克服 1-2 个复本的限制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
NPJ Systems Biology and Applications
NPJ Systems Biology and Applications Mathematics-Applied Mathematics
CiteScore
5.80
自引率
0.00%
发文量
46
审稿时长
8 weeks
期刊介绍: npj Systems Biology and Applications is an online Open Access journal dedicated to publishing the premier research that takes a systems-oriented approach. The journal aims to provide a forum for the presentation of articles that help define this nascent field, as well as those that apply the advances to wider fields. We encourage studies that integrate, or aid the integration of, data, analyses and insight from molecules to organisms and broader systems. Important areas of interest include not only fundamental biological systems and drug discovery, but also applications to health, medical practice and implementation, big data, biotechnology, food science, human behaviour, broader biological systems and industrial applications of systems biology. We encourage all approaches, including network biology, application of control theory to biological systems, computational modelling and analysis, comprehensive and/or high-content measurements, theoretical, analytical and computational studies of system-level properties of biological systems and computational/software/data platforms enabling such studies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信