{"title":"Nonparametric Bayesian Two-Level Clustering for Subject-Level Single-Cell Expression Data","authors":"Qiuyu Wu, Xiangyu Luo","doi":"10.5705/ss.202020.0337","DOIUrl":null,"url":null,"abstract":"The advent of single-cell sequencing opens new avenues for personalized treatment. In this paper, we address a two-level clustering problem of simultaneous subject subgroup discovery (subject level) and cell type detection (cell level) for single-cell expression data from multiple subjects. However, current statistical approaches either cluster cells without considering the subject heterogeneity or group subjects without using the single-cell information. To bridge the gap between cell clustering and subject grouping, we develop a nonparametric Bayesian model, Subject and Cell clustering for Single-Cell expression data (SCSC) model, to achieve subject and cell grouping simultaneously. SCSC does not need to prespecify the subject subgroup number or the cell type number. It automatically induces subject subgroup structures and matches cell types across subjects. Moreover, it directly models the single-cell raw count data by deliberately considering the data's dropouts, library sizes, and over-dispersion. A blocked Gibbs sampler is proposed for the posterior inference. Simulation studies and the application to a multi-subject iPSC scRNA-seq dataset validate the ability of SCSC to simultaneously cluster subjects and cells.","PeriodicalId":49478,"journal":{"name":"Statistica Sinica","volume":"18 1","pages":"0"},"PeriodicalIF":1.5000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistica Sinica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5705/ss.202020.0337","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 1
Abstract
The advent of single-cell sequencing opens new avenues for personalized treatment. In this paper, we address a two-level clustering problem of simultaneous subject subgroup discovery (subject level) and cell type detection (cell level) for single-cell expression data from multiple subjects. However, current statistical approaches either cluster cells without considering the subject heterogeneity or group subjects without using the single-cell information. To bridge the gap between cell clustering and subject grouping, we develop a nonparametric Bayesian model, Subject and Cell clustering for Single-Cell expression data (SCSC) model, to achieve subject and cell grouping simultaneously. SCSC does not need to prespecify the subject subgroup number or the cell type number. It automatically induces subject subgroup structures and matches cell types across subjects. Moreover, it directly models the single-cell raw count data by deliberately considering the data's dropouts, library sizes, and over-dispersion. A blocked Gibbs sampler is proposed for the posterior inference. Simulation studies and the application to a multi-subject iPSC scRNA-seq dataset validate the ability of SCSC to simultaneously cluster subjects and cells.
期刊介绍:
Statistica Sinica aims to meet the needs of statisticians in a rapidly changing world. It provides a forum for the publication of innovative work of high quality in all areas of statistics, including theory, methodology and applications. The journal encourages the development and principled use of statistical methodology that is relevant for society, science and technology.