使用dreamlet对大规模单细胞转录组学数据进行高效差异表达分析。

bioRxiv : the preprint server for biology Pub Date : 2024-11-20 DOI:10.1101/2023.03.17.533005

Gabriel E Hoffman, Donghoon Lee, Jaroslav Bendl, N M Prashant, Aram Hong, Clara Casey, Marcela Alvia, Zhiping Shao, Stathis Argyriou, Karen Therrien, Sanan Venkatesh, Georgios Voloudakis, Vahram Haroutunian, John F Fullard, Panos Roussos

{"title":"使用dreamlet对大规模单细胞转录组学数据进行高效差异表达分析。","authors":"Gabriel E Hoffman, Donghoon Lee, Jaroslav Bendl, N M Prashant, Aram Hong, Clara Casey, Marcela Alvia, Zhiping Shao, Stathis Argyriou, Karen Therrien, Sanan Venkatesh, Georgios Voloudakis, Vahram Haroutunian, John F Fullard, Panos Roussos","doi":"10.1101/2023.03.17.533005","DOIUrl":null,"url":null,"abstract":"Advances in single-cell and -nucleus transcriptomics have enabled generation of increasingly large-scale datasets from hundreds of subjects and millions of cells. These studies promise to give unprecedented insight into the cell type specific biology of human disease. Yet performing differential expression analyses across subjects remains difficult due to challenges in statistical modeling of these complex studies and scaling analyses to large datasets. Our open-source R package dreamlet (DiseaseNeurogenomics.github.io/dreamlet) uses a pseudobulk approach based on precision-weighted linear mixed models to identify genes differentially expressed with traits across subjects for each cell cluster. Designed for data from large cohorts, dreamlet is substantially faster and uses less memory than existing workflows, while supporting complex statistical models and controlling the false positive rate. We demonstrate computational and statistical performance on published datasets, and a novel dataset of 1.4M single nuclei from postmortem brains of 150 Alzheimer's disease cases and 149 controls.","PeriodicalId":72407,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/56/f1/nihpp-2023.03.17.533005v1.PMC10055252.pdf","citationCount":"0","resultStr":"{\"title\":\"Efficient differential expression analysis of large-scale single cell transcriptomics data using dreamlet.\",\"authors\":\"Gabriel E Hoffman, Donghoon Lee, Jaroslav Bendl, N M Prashant, Aram Hong, Clara Casey, Marcela Alvia, Zhiping Shao, Stathis Argyriou, Karen Therrien, Sanan Venkatesh, Georgios Voloudakis, Vahram Haroutunian, John F Fullard, Panos Roussos\",\"doi\":\"10.1101/2023.03.17.533005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advances in single-cell and -nucleus transcriptomics have enabled generation of increasingly large-scale datasets from hundreds of subjects and millions of cells. These studies promise to give unprecedented insight into the cell type specific biology of human disease. Yet performing differential expression analyses across subjects remains difficult due to challenges in statistical modeling of these complex studies and scaling analyses to large datasets. Our open-source R package dreamlet (DiseaseNeurogenomics.github.io/dreamlet) uses a pseudobulk approach based on precision-weighted linear mixed models to identify genes differentially expressed with traits across subjects for each cell cluster. Designed for data from large cohorts, dreamlet is substantially faster and uses less memory than existing workflows, while supporting complex statistical models and controlling the false positive rate. We demonstrate computational and statistical performance on published datasets, and a novel dataset of 1.4M single nuclei from postmortem brains of 150 Alzheimer's disease cases and 149 controls.\",\"PeriodicalId\":72407,\"journal\":{\"name\":\"bioRxiv : the preprint server for biology\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/56/f1/nihpp-2023.03.17.533005v1.PMC10055252.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv : the preprint server for biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2023.03.17.533005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2023.03.17.533005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

单细胞和细胞核转录组学的进展使数百名受试者和数百万细胞能够生成越来越大规模的数据集。这些研究有望为人类疾病的细胞类型特异性生物学提供前所未有的见解。然而，由于这些复杂研究的统计建模和将分析扩展到大型数据集的挑战，在受试者之间进行差异表达分析仍然很困难。我们的开源R软件包Dreamelet（DiseaseNeurogenomics.github.io/Dreamelet）使用基于精确加权线性混合模型的伪批量方法来识别每个细胞簇中受试者差异表达的基因。dreamlet是为来自大型队列的数据而设计的，与现有的工作流程相比，它速度更快，使用的内存更少，同时支持复杂的统计模型并控制假阳性率。我们在已发表的数据集上展示了计算和统计性能，并在150例阿尔茨海默病病例和149名对照的尸检大脑中展示了140万个单核的新数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Efficient differential expression analysis of large-scale single cell transcriptomics data using dreamlet.

查看原文本刊更多论文

Efficient differential expression analysis of large-scale single cell transcriptomics data using dreamlet.

Advances in single-cell and -nucleus transcriptomics have enabled generation of increasingly large-scale datasets from hundreds of subjects and millions of cells. These studies promise to give unprecedented insight into the cell type specific biology of human disease. Yet performing differential expression analyses across subjects remains difficult due to challenges in statistical modeling of these complex studies and scaling analyses to large datasets. Our open-source R package dreamlet (DiseaseNeurogenomics.github.io/dreamlet) uses a pseudobulk approach based on precision-weighted linear mixed models to identify genes differentially expressed with traits across subjects for each cell cluster. Designed for data from large cohorts, dreamlet is substantially faster and uses less memory than existing workflows, while supporting complex statistical models and controlling the false positive rate. We demonstrate computational and statistical performance on published datasets, and a novel dataset of 1.4M single nuclei from postmortem brains of 150 Alzheimer's disease cases and 149 controls.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

bioRxiv : the preprint server for biology

自引率

0.00%

发文量