Efficient differential expression analysis of large-scale single cell transcriptomics data using dreamlet.

Gabriel E Hoffman, Donghoon Lee, Jaroslav Bendl, N M Prashant, Aram Hong, Clara Casey, Marcela Alvia, Zhiping Shao, Stathis Argyriou, Karen Therrien, Sanan Venkatesh, Georgios Voloudakis, Vahram Haroutunian, John F Fullard, Panos Roussos
{"title":"Efficient differential expression analysis of large-scale single cell transcriptomics data using dreamlet.","authors":"Gabriel E Hoffman, Donghoon Lee, Jaroslav Bendl, N M Prashant, Aram Hong, Clara Casey, Marcela Alvia, Zhiping Shao, Stathis Argyriou, Karen Therrien, Sanan Venkatesh, Georgios Voloudakis, Vahram Haroutunian, John F Fullard, Panos Roussos","doi":"10.1101/2023.03.17.533005","DOIUrl":null,"url":null,"abstract":"<p><p>Advances in single-cell and -nucleus transcriptomics have enabled generation of increasingly large-scale datasets from hundreds of subjects and millions of cells. These studies promise to give unprecedented insight into the cell type specific biology of human disease. Yet performing differential expression analyses across subjects remains difficult due to challenges in statistical modeling of these complex studies and scaling analyses to large datasets. Our open-source R package dreamlet ( DiseaseNeurogenomics.github.io/dreamlet ) uses a pseudobulk approach based on precision-weighted linear mixed models to identify genes differentially expressed with traits across subjects for each cell cluster. Designed for data from large cohorts, dreamlet is substantially faster and uses less memory than existing workflows, while supporting complex statistical models and controlling the false positive rate. We demonstrate computational and statistical performance on published datasets, and a novel dataset of 1.4M single nuclei from postmortem brains of 150 Alzheimer's disease cases and 149 controls.</p>","PeriodicalId":72407,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/56/f1/nihpp-2023.03.17.533005v1.PMC10055252.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.03.17.533005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Advances in single-cell and -nucleus transcriptomics have enabled generation of increasingly large-scale datasets from hundreds of subjects and millions of cells. These studies promise to give unprecedented insight into the cell type specific biology of human disease. Yet performing differential expression analyses across subjects remains difficult due to challenges in statistical modeling of these complex studies and scaling analyses to large datasets. Our open-source R package dreamlet ( DiseaseNeurogenomics.github.io/dreamlet ) uses a pseudobulk approach based on precision-weighted linear mixed models to identify genes differentially expressed with traits across subjects for each cell cluster. Designed for data from large cohorts, dreamlet is substantially faster and uses less memory than existing workflows, while supporting complex statistical models and controlling the false positive rate. We demonstrate computational and statistical performance on published datasets, and a novel dataset of 1.4M single nuclei from postmortem brains of 150 Alzheimer's disease cases and 149 controls.

Abstract Image

Abstract Image

Abstract Image

使用dreamlet对大规模单细胞转录组学数据进行高效差异表达分析。
单细胞和细胞核转录组学的进展使数百名受试者和数百万细胞能够生成越来越大规模的数据集。这些研究有望为人类疾病的细胞类型特异性生物学提供前所未有的见解。然而,由于这些复杂研究的统计建模和将分析扩展到大型数据集的挑战,在受试者之间进行差异表达分析仍然很困难。我们的开源R软件包Dreamelet(DiseaseNeurogenomics.github.io/Dreamelet)使用基于精确加权线性混合模型的伪批量方法来识别每个细胞簇中受试者差异表达的基因。dreamlet是为来自大型队列的数据而设计的,与现有的工作流程相比,它速度更快,使用的内存更少,同时支持复杂的统计模型并控制假阳性率。我们在已发表的数据集上展示了计算和统计性能,并在150例阿尔茨海默病病例和149名对照的尸检大脑中展示了140万个单核的新数据集。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信